0% found this document useful (0 votes)
30 views13 pages

Flex BFT Ccs19

Flex Bft Ccs19

Uploaded by

Manish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views13 pages

Flex BFT Ccs19

Flex Bft Ccs19

Uploaded by

Manish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Flexible Byzantine Fault Tolerance

Dahlia Malkhi∗ Kartik Nayak∗ Ling Ren∗


dahlia.malkhi@gmail.com kartik@cs.duke.edu renling@illinois.edu
Calibra Duke University University of Illinois,
Urbana-Champaign

ABSTRACT as the algorithmic foundation of what is known as decentralized


This paper introduces Flexible BFT, a new approach for BFT con- ledgers, or blockchains.
sensus solution design revolving around two pillars, stronger re- In the classic approach to BFT protocol designs, a protocol de-
silience and diversity. The first pillar, stronger resilience, involves signer or a service administrator first picks a set of assumptions (e.g.,
a new fault model called alive-but-corrupt faults. Alive-but-corrupt the fraction of Byzantine faults and certain timing assumptions) and
replicas may arbitrarily deviate from the protocol in an attempt to then devises a protocol (or chooses an existing one) tailored for that
break safety of the protocol. However, if they cannot break safety, particular setting. The assumptions made by the protocol designer
they will not try to prevent liveness of the protocol. Combining are imposed upon all parties involved — every replica maintaining
alive-but-corrupt faults into the model, Flexible BFT is resilient to the service as well as every client using the service (also known
higher corruption levels than possible in a pure Byzantine fault as the “learner” role). Such a protocol collapses if deployed under
model. The second pillar, diversity, designs consensus solutions settings that differ from the one it is designed for. In particular,
whose protocol transcript is used to draw different commit deci- optimal-resilience partially synchronous solutions [11, 14] break
sions under diverse beliefs. With this separation, the same Flexible (lose safety and liveness) if the fraction of Byzantine faults exceeds
BFT solution supports synchronous and asynchronous beliefs, as 1/3. Similarly, optimal-resilience synchronous solutions [1, 18] do
well as varying resilience threshold combinations of Byzantine and not obtain safety or liveness if the fraction of Byzantine faults
alive-but-corrupt faults. exceeds 1/2 or if the synchrony bound is violated.
At a technical level, Flexible BFT achieves the above results using In this work, we introduce a new approach for BFT protocol
two new ideas. First, it introduces a synchronous BFT protocol in design called Flexible BFT, offering advantages in the two aspects
which only the commit step requires to know the network delay above. We elaborate on the two aspects below.
bound and thus replicas execute the protocol without any syn- Stronger resilience. We introduce a mixed fault model with a
chrony assumption. Second, it introduces a notion called Flexible new type of fault called alive-but-corrupt (a-b-c for short) faults.
Byzantine Quorums by dissecting the roles of different quorums in Alive-but-corrupt replicas actively try to disrupt the system from
existing consensus protocols. maintaining a safe consensus decision and they might arbitrarily
deviate from the protocol for this purpose. However, if they cannot
CCS CONCEPTS break safety, they will not try to prevent the system from reaching
• Security and privacy → Distributed systems security. a (safe) decision. The rationale for this new type of fault is that
violating safety may provide the attacker gains (e.g., a double spend
KEYWORDS attack) but preventing liveness usually does not. In fact, a-b-c repli-
Distributed computing, Byzantine Fault Tolerance, Synchrony cas may gain rewards from keeping the replicated service live, e.g.,
ACM Reference Format: by collecting service fees. We show a family of protocols that toler-
Dahlia Malkhi, Kartik Nayak, and Ling Ren. 2019. Flexible Byzantine Fault ate a combination of Byzantine and a-b-c faults that exceeds 1/3 in
Tolerance. In 2019 ACM SIGSAC Conference on Computer and Communi- the partially synchronous model and exceeds 1/2 in the synchro-
cations Security (CCS ’19), November 11–15, 2019, London, United Kingdom. nous model. Our results do not violate existing resilience bounds
ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3319535.3354225 because the fraction of Byzantine faults is always smaller than the
respective bounds.
1 INTRODUCTION
Diversity. The Flexible BFT approach further provides certain sep-
Byzantine fault tolerant (BFT) protocols are used to build replicated
aration between the fault model and the protocol. The design ap-
services [24, 33, 34]. Recently, they have received revived interest
proach builds a protocol whose transcript can be interpreted by
* The work was done when the authors were working at VMware Research. learners with diverse beliefs, who draw different consensus commit
decisions based on their beliefs. Flexible BFT guarantees safety
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
(agreement) and liveness for all learners that have correct beliefs.
for profit or commercial advantage and that copies bear this notice and the full citation Each learner specifies (i) the fault threshold it needs to tolerate,
on the first page. Copyrights for components of this work owned by others than the and (ii) the message delay bound, if any, it believes in. For example,
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission one instance of Flexible BFT can support a learner that requires
and/or a fee. Request permissions from permissions@acm.org. tolerance against 1/5 Byzantine faults plus 3/10 a-b-c faults, while
CCS ’19, November 11–15, 2019, London, United Kingdom simultaneously supporting another learner who requires tolerance
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6747-9/19/11. . . $15.00 against 1/10 Byzantine faults plus 1/2 a-b-c faults, and a third
https://doi.org/10.1145/3319535.3354225
learner who believes in synchrony and requires 3/10 Byzantine and let learners choose their own quorum sizes for committing in
plus 2/5 a-b-c tolerance. the protocol.
This novel separation of fault model from protocol design can
be useful in practice in several ways. First, different learners may Contributions. To summarize, our work has the following contri-
naturally hold different assumptions about the system. Some learn- butions.
ers may be more cautious and require a higher resilience than (1) Alive-but-corrupt faults. We introduce a new type of fault,
others; some learners may believe in synchrony while others do called alive-but-corrupt fault, which attack safety but not
not. Moreover, even the same learner may assume a larger fraction liveness.
of faults when dealing with a $1M transaction compared to a $5
one. The rationale is that more replicas may be willing to collude (2) Synchronous BFT with network speed replicas. We
to double spend a high-value transaction. In this case, the learner present a synchronous protocol in which only the commit
can wait for more votes before committing the $1M transaction. step requires synchrony. Since replicas no longer perform
Last but not least, a learner may update its assumptions based on commits in our approach, the protocol simultaneously sup-
certain events it observes. For example, if a learner receives votes ports learners assuming different synchrony bounds.
for conflicting values, which may indicate an attempt at attacking
safety, it can start requiring more votes than usual; if a learner (3) Flexible Byzantine Quorums. We deconstruct existing
who believes in synchrony notices abnormally long message delays, BFT protocols to understand the role played by different
which may indicate an attack on network infrastructure, it can quorums and introduce the notion of Flexible Byzantine
update its synchrony bound to be more conservative or switch to a Quorums. A protocol based on Flexible Byzantine Quorums
partial-synchrony assumption. simultaneously supports learners assuming different fault
The notion of “commit” needs to be clarified in our new model. models.
Learners in Flexible BFT have different assumptions and hence
(4) One BFT consensus solution for the populace. Putting
different commit rules. It is then possible and common that a value is
the above together, we present a new approach for BFT de-
committed by one learner but not another. Flexible BFT guarantees
sign, Flexible BFT. Our approach has stronger resilience
that any two learners whose assumptions are correct (but possibly
and diversity: Flexible BFT tolerates a fraction of com-
different) commit to the same value. If a learner’s assumption is
bined (Byzantine plus a-b-c) faults beyond existing resilience
incorrect, however, it may commit inconsistent values which may
bounds. And learners with diverse fault and timing beliefs
later be reverted. While this new notion of commit may sound
are supported in the same protocol.
radical at first, it is the implicit behavior of existing BFT protocols.
If the assumption made by the service administrator is violated in
a classic BFT protocol (e.g., there are more Byzantine faults than Organization. The rest of the paper is organized as follows. Sec-
provisioned), learners may commit to different values and they have tion 2 defines the Flexible BFT model where replicas and learners
no recourse. In this sense, Flexible BFT is a robust generalization are separated. We will describe in more detail our key techniques
of classic BFT protocols. In Flexible BFT, if a learner performs for synchrony and partial-synchrony in Sections 3 and 4, respec-
conflicting commits, it should update its assumption to be more tively. Section 5 puts these techniques together and presents the
cautious and re-interpret what values are committed under its new final protocol. Section 6 discusses the result obtained by the Flexible
assumption. In fact, this “recovery” behavior is somewhat akin to BFT design and Section 7 describes related work.
Bitcoin. A learner in Bitcoin decides how many confirmations are
needed (i.e., how “deeply buried”) to commit a block. If the learner 2 MODELING FLEXIBLE BFT
commits but subsequently an alternative longer fork appears, its The goal of Flexible BFT is to build a replicated service that takes
commit is reverted. Going forward, the learner may increase the requests from learners and provides learners an interface of a single
number of confirmations it requires. non-faulty server, i.e., it provides learners with the same totally
Key techniques. Flexible BFT centers around two new techniques. ordered sequence of values. Internally, the replicated service uses
The first one is a novel synchronous BFT protocol with replicas exe- multiple servers, also called replicas, to tolerate some number of
cuting at network speed; that is, the protocol run by the replicas does faulty servers. The total number of replicas is denoted by n. In this
not assume synchrony. This allows learners in the same protocol to paper, whenever we speak about a set of replicas or messages, we
assume different message delay bounds and commit at their own denote the set size as its fraction over n. For example, we refer to a
pace. The protocol thus separates timing assumptions of replicas set of m replicas as “q replicas” where q = m/n.
from timing assumptions of learners. Note that this is only possible Borrowing notation from Lamport [23], such a replicated service
via Flexible BFT’s separation of protocol from the fault model: the has three logical actors: proposers capable of sending new values,
action of committing is only carried out by learners, not by replicas. acceptors who add these values to a totally ordered sequence (called
The other technique involves a breakdown of the different roles a blockchain), and learners who decide on a sequence of values
that quorums play in different steps of partially synchronous BFT based on the transcript of the protocol and execute them on a state
protocols. Once again, made possible by the separation in Flexible machine. Existing replication protocols provide the following two
BFT, we will use one quorum size for replicas to run a protocol, properties:

- Safety. Any two learners learn the same sequence of values.


- Liveness. A value proposed by a proposer will eventually be hand, rely on synchrony bounds to commit. This separation is what
executed by every learner. allows our protocol to support learners with different assumptions
on the value of ∆. To the best of our knowledge, this is the first
In existing replication protocols, the learners are assumed to synchronous protocol to achieve such a separation. In addition,
be uniform, i.e., they interpret a transcript using the same rules the protocol tolerates a combined Byzantine plus a-b-c fault ratio
and hence decide on the same sequence of values. In Flexible BFT, greater than a half (Byzantine fault tolerance is still less than half).
we consider diverse learners with different assumptions. Based on For simplicity, in this overview, we show a protocol for single
their own assumptions, they may interpret the transcript of the shot consensus. In our final protocol in Section 5, we will consider
protocol differently. We show that so far as the assumptions of two a pipelined version of the protocol for consensus on a sequence of
different learners are both correct, they will eventually learn the values. We do not consider termination for the single-shot consen-
same sequence of values. In the Flexible BFT approach, safety and sus protocol in this overview because our final replication protocol
liveness guarantees are defined with respect to learners. is supposed to run forever.
The protocol is shown in Figure 1. It runs in a sequence of views.
- Safety for diverse learners. Any two learners with correct but
Each view has a designated leader who may be selected in a round
potentially different assumptions learn the same sequence
robin order. The leader drives consensus in that view. In each view,
of values.
the protocol runs in two steps – propose and vote. In the propose
- Liveness for diverse learners. A value proposed by a proposer step, the leader proposes a value b. In the vote step, replicas vote
will eventually be executed by every learner with a correct for the value if it is safe to do so. The vote also acts as a re-proposal
assumption. of the value. If a replica observes a set of qr votes on b, called a
certificate Cqr (b), it “locks” on b. For now, we assume qr = 1/2.
Fault model. We assume two types of faults within the replicas: (To be precise, qr is slightly larger than 1/2, e.g., f + 1 out of 2f + 1.)
Byzantine and alive-but-corrupt (a-b-c for short). Byzantine replicas We will revisit the choice of qr in Section 6. In subsequent views, a
behave arbitrarily. On the other hand, the goal of a-b-c replicas is replica will not vote for a value other than b unless it learns that qr
to attack safety but to preserve liveness. These replicas will take replicas are not locked on b. In addition, the replicas switch views
any actions that help them break safety of the protocol. However, (i.e., change leader) if they either observe an equivocation or if they
if they cannot succeed in breaking safety, they will help provide do not receive a proposal from the leader within some timeout. A
liveness. Consequently, in this new fault model, the safety proof learner commits b if qr replicas state that there exists a view in
should treat a-b-c replicas similarly to Byzantine. Then, once safety which b is certified and no equivocating value or view change was
is proved, the liveness proof can treat a-b-c replicas similarly to observed at a time before 2∆ after it was certified. Here, ∆ is the
honest. We assume that the adversary is static, i.e., the adversary maximum network delay the learner believes in.
determines which replicas are Byzantine and a-b-c before the start The protocol ensures safety if there are fewer than qr faulty
of the protocol. replicas. The key argument for safety is the following: If an honest
replica h satisfies the commit condition for some value b in a view,
Other assumptions. We assume hash functions, digital signatures then (a) no other value can be certified and (b) all honest replicas
and a public-key infrastructure (PKI). We use ⟨x⟩R to denote a mes- are locked on b at the end of that view. To elaborate, satisfying the
sage x signed by a replica R. We assume pair-wise communication commit condition implies that some honest replica h has observed
channels between replicas. We assume that all replicas have clocks an undisturbed-2∆ period after it locked on b, i.e., it did not observe
that advance at the same rate. an equivocation or a view change. Suppose the condition is satis-
fied at time t. This implies that other replicas did not observe an
equivocation or a view change before t − ∆. The two properties
3 SYNCHRONOUS BFT WITH NETWORK above hold if the quorum honesty conditions described below hold.
SPEED REPLICAS - OVERVIEW For liveness, if Byzantine leaders equivocate or do not propose a
Early synchronous protocols [13, 20] have relied on synchrony in safe value, they will be blamed by both honest and a-b-c replicas
two ways. First, the replicas assume a maximum network delay and a view change will ensue. Eventually there will be an honest
∆ for communication between them. Second, they require a lock or a-b-c leader to drive consensus if quorum availability holds.
step execution, i.e., all replicas are in the same round at the same
time. Hanke et al. showed a synchronous protocol without lock Quorum honesty (a) within a view. Since the undisturbed pe-
step execution [18]. Their protocol still contains a synchronous riod starts after b is certified, h must have voted (and re-
step in which all replicas perform a blocking wait of 2∆ time before proposed) b at a time earlier than t −2∆. Every honest replica
proceeding to subsequent steps. Sync HotStuff [4] improves on must have received b before t − ∆. Since they had not voted
it further to remove replicas’ blocking waits during good periods for an equivocating value by then, they must have voted for
(when the leader is honest), but blocking waits are still required by b. Since the number of faults is less than qr , every certificate
replicas during bad situations (view changes). needs to contain an honest replica’s vote. Thus, no certificate
In this section, we show a synchronous protocol where the repli- for any other value can be formed in this view.
q
cas do not ever have blocking waits and execute at the network Quorum honesty (b) across views. h sends Cv r (b) at time t −2∆.
q
speed. In other words, replicas run a partially synchronous protocol All honest receive Cv r (b) by time t − ∆ and become locked
and do not rely on synchrony at any point. Learners, on the other
Protocol executed by the replicas.
(1) Propose. The leader L of view v proposes a value b.
(2) Vote. On receiving the first value b in a view v, a replica broadcasts b and votes for b if it is safe to do so, as determined by a locking
mechanism described later. The replica records the following.
q
- If the replica collects qr votes on b, denoted as Cv r (b) and called a certificate of b from view v, then it “locks” on b and records
the lock time as t-lockv .
- If the replica observes an equivocating value signed by L at any time after entering view v, it records the time of equivocation as
t-equivv . It blames the leader by broadcasting ⟨blame, v⟩ and the equivocating values.
- If the replica does not receive a proposal for sufficient time in view v, it times out and broadcasts ⟨blame, v⟩.
- If the replica collects a set of qr ⟨blame, v⟩ messages, it records the time as t-viewchangev , broadcasts them and enters view v + 1.
If a replica locks on a value b in a view, then it votes only for b in subsequent views unless it “unlocks” from b by learning that qr replicas
are not locked on b in that view or higher views (they may be locked on other values or they may not be locked at all).

Commit rules for learners. A value b is said to be committed by a learner assuming ∆-synchrony iff qr replicas each report that there
exists a view v such that,
q
(1) b is certified, i.e., Cv r (b) exists.
(2) the replica observed an undisturbed-2∆ period after certification, i.e., no equivocating value or view change was observed at a time
before 2∆ after it was certified, or more formally, min(current-time, t-equivv , t-viewchangev ) − t-lockv ≥ 2∆

Figure 1: Synchronous BFT with network speed replicas.

on b. For an honest replica to unlock from b in subsequent underlie existing partially synchronous protocols that tolerate 1/3
views, qr replicas need to claim that they are not locked on Byzantine faults (Section 4.1). We will illustrate that multiple uses
b. At least one of them is honest and would need to falsely of 2/3-quorums actually serve different purposes in these protocols.
claim it is not locked, which cannot happen. We then generalize these protocols to use Flexible Byzantine Quo-
Quorum availability. Byzantine replicas do not exceed 1 − qr so rums (Section 4.2), the key idea that enables more than 1/3 fault
that qr replicas respond to the leader. tolerance and allows diverse learners with varying assumptions to
co-exist.

Tolerating a-b-c faults. If we have only honest and Byzantine


replicas (and no a-b-c replicas), quorum honesty requires the frac- 4.1 Background: Quorums in PBFT
tion of Byzantine replicas B < qr . Quorum availability requires Existing protocols for solving consensus in the partially synchro-
B ≤ 1 − qr . If we optimize for maximizing B, we obtain qr = 1/2. nous setting with optimal 1/3-resilience revolve around voting by
Now, suppose P represents the fraction of a-b-c replicas. Quorum Byzantine quorums of replicas. Two properties of Byzantine quo-
honesty requires B + P < qr , and quorum availability requires rums are utilized for achieving safety and liveness. First, any two
B ≤ 1 − qr . Thus, the protocol supports varying values of B and quorums intersect at one honest replica – quorum intersection.
P at different values of qr > 1/2 such that safety and liveness are Second, there exists a quorum that contains no Byzantine faulty
both preserved. replicas – quorum availability. Concretely, when less than 1/3 the
replicas are Byzantine, quorums are set to size qr = 2/3. (To be
Separating learner synchrony assumption from the replica
precise, qr is slightly larger than 2/3, i.e., 2f + 1 out of 3f + 1
protocol. The most interesting aspect of this protocol is the sep-
where f is the number of faults, but we will use qr = 2/3 for
aration of the learner commit rule from the protocol design. In
ease of exposition.) This guarantees an intersection of size at least
particular, although this is a synchronous protocol, the replica pro-
2qr − 1 = 1/3, hence at least one honest replica in the intersection.
tocol does not rely on any synchrony bound. This allows learner to
As for availability, there exist qr = 2/3 honest replicas to form a
choose their own message delay bounds. Any learner that uses a
quorum.
correct message delay bound enjoys safety.
To dissect the use of quorums in BFT protocols, consider their
use in PBFT [11] for providing safety and liveness. PBFT operates in
4 FLEXIBLE BYZANTINE QUORUMS FOR a view-by-view manner. Each view has a unique leader and consists
of the following steps:
PARTIAL SYNCHRONY - OVERVIEW
In this section, we explain the high-level insights of Flexible Byzan- - Propose. A leader L proposes a value b.
tine Quorums in Flexible BFT. Again, for ease of exposition, we - Vote 1. On receiving the first value b for a view v, a replica
focus on a single-shot consensus and do not consider termination. votes for b if it is safe, as determined by a locking mechanism
We start by reviewing the Byzantine Quorum Systems [28] that described below. A set of qr votes form a certificate Cqr (b).
- Vote 2. On collecting Cqr (b), a replica “locks” on b and votes - Propose. A leader L proposes a value b.
for Cqr (b). - Vote 1. On receiving the first value b for a view v, a replica
- Commit. On collecting qr votes for Cqr (b), a learner learns votes for b if it is safe, as determined by a locking mechanism
that proposal b becomes a committed decision. described below. A set of q lck votes forms a certificate Cqlck (b).
If a replica locks on a value b in a view, then it votes only for b in
- Vote 2. On collecting Cqlck (b), a replica “locks” on b and votes
subsequent views unless it “unlocks” from b. A replica “unlocks”
for Cqlck (b).
from b if it learns that qr replicas are not locked on b in that view
or higher (they may be locked on other values or they may not be - Commit. On collecting q unq votes for b and q cmt votes for
locked at all). Cqlck (b), a learner learns that proposal b becomes a committed
The properties of Byzantine quorums are harnessed in PBFT for decision.
safety and liveness as follows: If a replica locks on a value b in a view, then it votes only for b
in subsequent views unless it “unlocks” from b by learning that
Quorum intersection within a view. Safety within a view is en- q ulck replicas are not locked on b.
sured by the first round of votes. A replica votes only once
per view. For two distinct values to both obtain certificates,
one honest replica needs to vote for both, which cannot
happen. b, q ulck replicas need to claim they are not locked on b.
This property mandates that every q ulck quorum intersects
Quorum intersection across views. Safety across views is en- with every q cmt quorum at at least one honest replica. This
sured by the locking mechanism. If b becomes a committed property ensures that, if a learner commits a value, then
decision in a view, then a quorum of replicas lock on b in replicas who have locked on the value cannot be unlocked
that view. For an honest replica among them to unlock from from it. This property requires the fraction of faulty replicas
b, a quorum of replicas need to claim they are not locked to be less than q ulck + q cmt − 1.
on b. At least one replica in the intersection is honest and
would need to falsely claim it is not locked, which cannot Flexible quorum availability within a view. For live-
happen. ness, Byzantine replicas cannot exceed 1 −
max(q unq, q cmt, q lck, q ulck ) so that the aforementioned
Quorum availability within a view. Liveness within each view quorums can be formed at different stages of the protocol.
is guaranteed by having an honest quorum respond to a
non-faulty leader. Given the above analysis, Flexible BFT ensures safety if the
fraction of faulty replicas is less than min(q unq + q lck − 1, q cmt +
4.2 Flexible Byzantine Quorums q ulck − 1), and provides liveness if the fraction of Byzantine replicas
Our Flexible BFT approach separates the quorums used in BFT is at most 1−max(q unq, q cmt, q lck, q ulck ). It is optimal to use balanced
protocols for the replicas (acceptors) from the quorums used for quorum sizes where q lck = q ulck and q unq = q cmt . To see this, first
learning when a decision becomes committed. More specifically, note that we should make sure q unq +q lck = q cmt +q ulck ; otherwise,
we denote the quorum used for forming certificates (locking) by suppose the right-hand side is smaller, then setting (q cmt, q ulck )
q lck and the quorum used for unlocking by q ulck . We denote the to equal (q unq, q lck ) improves safety tolerance without affecting
quorum employed by learners for learning certificate uniqueness liveness tolerance. Next, observe that if we have q unq +q lck = q cmt +
by q unq , and the quorum used for learning commit safety by q cmt . q ulck but q lck > q ulck (and hence q unq < q cmt ), then once again
In other words, learners mandate q unq first-round votes and q cmt setting (q cmt, q ulck ) to equal (q unq, q lck ) improves safety tolerance
second-round votes in order to commit a decision. Below, we out- without affecting liveness tolerance.
line a modified PBFT-like protocol that uses these different quorum Thus, in this paper, we set q lck = q ulck = qr and q unq = q cmt =
sizes instead of a single quorum size q. We then introduce a new def- qc . Since replicas use qr votes to lock, these votes can always be
inition, Flexible Byzantine Quorums, that capture the requirements used by the learners to commit q cmt quorums. Thus, qc ≥ qr .
needed for these quorums to provide safety and liveness. The Flexible Byzantine Quorum requirements collapse into the
following two conditions.
Flexible quorum intersection (a) within a view. Contrary to Flexible quorum intersection. The fraction of faulty replicas is
PBFT, in Flexible BFT, a pair of q lck certificates need not < qc + qr − 1.
intersect in an honest replica. Indeed, locking on a value Flexible quorum availability. The fraction of Byzantine replicas
does not preclude conflicting locks. Instead, this property is ≤ 1 − qc .
mandates that every q lck quorum intersects with every q unq
quorum at at least one honest replica. This property ensures Tolerating a-b-c faults. If all faults in the system are Byzantine
that, if a learner commits a value, it is the only certified value faults, then the best parameter choice is qc = qr ≥ 2/3 for < 1/3
within the view. This property requires the fraction of faulty fault tolerance, and Flexible Byzantine Quorums degenerate to basic
replicas to be less than q lck + q unq − 1. Byzantine quorums. However, in our model, a-b-c replicas are only
Flexible quorum intersection (b) across views. If a learner interested in attacking safety but not liveness. This allows us to
commits a value b in a view, q cmt replicas lock on b in that tolerate qc + qr − 1 total faults (Byzantine plus a-b-c), which can
view. For an honest replica among them to unlock from be more than 1/3. For example, if we set qr = 0.7 and qc = 0.8,
then such a protocol can tolerate 0.2 Byzantine faults plus 0.3 a-b-c Certificates and certified blocks. In the protocol, replicas vote
q
faults. We discuss the choice for qr and qc and their rationale in for blocks by signing them. We use Cv r (Bk ) to denote a set of
Section 6. signatures on hk = H (Bk ) by qr replicas in view v. qr is a parameter
q
fixed for the protocol instance. We call Cv r (Bk ) a certificate for Bk
Separating learner commit rules from the replica protocol. from view v. Certified blocks are ranked first by the views in which
A key property of the Flexible Byzantine Quorum approach is that they are certified and then by their heights. In other words, a block
it decouples the BFT protocol from learner commit rules. The decou- Bk certified in view v is ranked higher than a block Bk ′ certified in
pling allows learners assuming different fault models to utilize the view v ′ if either (i) v > v ′ or (ii) v = v ′ and k > k ′ .
same protocol. In the above protocol, the propose and two voting
steps are executed by the replicas and they are only parameter- Locked blocks. At any time, a replica locks the highest certified
ized by qr . The commit step can be carried by different learners block to its knowledge. During the protocol execution, each replica
using different commit thresholds qc . Thus, a fixed qr determines keeps track of all signatures for all blocks and keeps updating its
a possible set of learners with varying commit rules (in terms of locked block. Looking ahead, the notion of locked block will be
Byzantine and a-b-c adversaries). Recall that a Byzantine adversary used to guard the safety of a learner commit.
can behave arbitrarily and thus may not provide liveness whereas
an a-b-c adversary only intends to attack safety but not liveness. 5.2 Replica Protocol
Thus, a learner who believes that a large fraction of faulty replicas The replica protocol progresses in a view-by-view fashion. Each
may attempt to break safety, not progress, can choose a larger qc . By view has a designated leader who is responsible for driving con-
doing so, it seeks stronger safety against dishonest replicas, while sensus on a sequence of blocks. Leaders can be chosen statically,
trading liveness. Conversely, a learner that assumes that a large e.g., round robin, or randomly using more sophisticated tech-
fraction of faulty replicas attack liveness must choose a smaller qc . niques [10, 31]. In our description, we assume a round robin selec-
tion of leaders, i.e., (v mod n) is the leader of view v.
5 FLEXIBLE BFT PROTOCOL At a high level, the protocol does the following: The leader pro-
In this section, we combine the ideas presented in Sections 3 and poses a block to all replicas. The replicas vote on it if safe to do
4 to obtain a final protocol that supports both types of learners. so. The block becomes certified once qr replicas vote on it. The
A learner can either assume partial synchrony, with freedom to leader will then propose another block extending the previous one,
choose qc as described in the previous section, or assume synchrony chaining blocks one after another at increasing heights. Unlike
with its own choice of ∆, as described in Section 3. Replicas execute regular consensus protocols where replicas determine when a block
a protocol at the network speed with a parameter qr . We first give is committed, in Flexible BFT, replicas only certify blocks while
the protocol executed by the replicas and then discuss how learners committing is offloaded to the learners. If at any time replicas detect
commit depending on their assumptions. Moreover, inspired by malicious leader behavior or lack of progress in a view, they blame
Casper [9] and HotStuff [36], we show a protocol where the rounds the leader and engage in a view change protocol to replace the
of voting can be pipelined. leader and move to the next view. The new leader collects a status
from different replicas and continues to propose blocks based on
this status. We explain the steady state and view change protocols
5.1 Notation
in more detail below.
Before describing the protocol, we will first define some data struc-
tures and terminologies that will aid presentation. Steady state protocol. The steady state protocol is described in
Figure 2. In the steady state, there is a unique leader who, in an
Block format. The pipelined protocol forms a chain of values. We iteration, proposes a block, waits for votes from qr replicas and
use the term block to refer to each value in the chain. We refer to a moves to the next iteration. In the steady state, an honest leader
block’s position in the chain as its height. A block Bk at height k always extends the previous block it proposed. Immediately after a
has the following format view change, since the previous leaders could have been Byzantine
and may have proposed equivocating blocks, the new leader needs
Bk := (bk , hk−1 ) to determine a safe block to propose. It does so by collecting a status
of locked blocks from qr replicas denoted by S (described in the
where bk denotes a proposed value at height k and hk−1 := H (Bk −1 ) view change protocol).
is a hash digest of the predecessor block. The first block B 1 = (b1, ⊥) For a replica R in the steady state, on receiving a proposal for
has no predecessor. Every subsequent block Bk must specify a block Bk , a replica votes for it if it extends the previous proposed
predecessor block Bk −1 by including a hash of it. We say a block is block in the view or if it extends the highest certified block in
valid if (i) its predecessor is valid or ⊥, and (ii) its proposed value S. Replica R can potentially receive blocks out of order and thus
meets application-level validity conditions and is consistent with receive Bk before its ancestor blocks. In this case, replica R waits
its chain of ancestors (e.g., does not double spend a transaction in until it receives the ancestor blocks, verifies the validity of those
one of its ancestor blocks). blocks and Bk before voting for Bk . In addition, replica R records
Block extension and equivocation. We say Bl extends Bk , if Bk the following to aid a learner commit:
is an ancestor of Bl (l > k). We say two blocks Bl and Bl′′ equivocate - Number of votes. It records the number of votes received
one another if they are not equal and do not extend one another. for Bk in view v as q Bk ,v . Observe that votes are broadcast
Let v be the current view number and replica L be the leader in this view. Perform the following steps in an iteration.

(1) Propose. ▷ Executed by the leader of view v


q
The leader L broadcasts ⟨propose, Bk , v, Cv ′r (Bk −1 ), S⟩L . Here, Bk := (bk , hk −1 ) is the newly proposed block and it should extend the
highest certified block known to L. In the steady state, an honest leader L would extend the previous block it proposed, in which case
v ′ = v and S = ⊥. Immediately after a view change, L determines the highest certified block from the status S received during the
view change.
(2) Vote. ▷ Executed by all replicas
q
When a replica R receives a valid proposal ⟨propose, Bk , v, Cv ′r (Bk −1 ), S⟩L from the leader L, R broadcasts the proposal and a vote
⟨vote, Bk , v⟩R if (i) the proposal is the first one in view v, and it extends the highest certified block in S, or (ii) the proposal extends the
last proposed block in the view.
In addition, replica R records the following based on the messages it receives.

- R keeps track of the number of votes received for this block in this view as q Bk ,v .
- If block Bk −1 has been proposed in view v, R marks Bk −1 as a locked block and records the locked time as t-lockk −1,v .
- If a block equivocating Bk−1 is proposed by L in view v (possibly received through a vote), R records the time t-equivk −1,v at which
the equivocating block is received.

The replica then enters the next iteration. If the replica observes no progress or equivocating blocks in the same view v, it stops voting
in view v and sends ⟨blame, v⟩r message to all replicas.

Figure 2: Flexible BFT steady state protocol.

by all replicas and the number of votes for a block can be 5.3 Learner Commit Rules
greater than qr . q Bk ,v will be updated each time the replica As mentioned in the introduction, Flexible BFT supports learners
hears about a new vote in view v. with different assumptions. Learners in Flexible BFT learn the state
- Lock time. If Bk−1 was proposed in the same view v, it locks of the protocol from the replicas and based on their own assump-
Bk −1 and records the locked time as t-lockk −1,v . tions determine whether a block has been committed. Broadly, we
- Equivocation time. If the replica ever observes an equiv- supports two types of learners: those who believe in synchrony and
ocating block at height k in view v through a proposal or those who believe in partial synchrony.
vote, it stores the time of equivocation as t-equivk ,v .
5.3.1 Learners with Partial-Synchrony Assumptions (CR1). A
learner with partial-synchrony assumptions deduces whether a
block has been committed by based on the number of votes received
Looking ahead, the locked time t-lockk −1,v and equivocation time by a block. A block Bl (together with its ancestors) is committed
t-equivk −1,v will be used by learners with synchrony assumptions with parameter qc iff Bl and its immediate successor both receive
to commit, and the number of votes q Bk ,v will be used by learners ≥ qc votes in the same view.
with partial-synchrony assumptions to commit. Safety of CR1. A CR1 commit based on qc votes is safe against
Leader monitoring. If a replica detects a lack of progress in view < qc + qr − 1 faulty replicas (Byzantine plus a-b-c). Observe that if
v or observes malicious leader behavior such as more than one Bl gets qc votes in view v, due to flexible quorum intersection, a
height-k blocks in the same view, it blames the leader of view v by conflicting block cannot be certified in view v, unless ≥ qc + qr − 1
broadcasting a ⟨blame, v⟩ message. It quits view v and stops voting replicas are faulty. Moreover, Bl +1 extending Bl has also received
and broadcasting blocks in view v. To determine lack of progress, qc votes in view v. Thus, qc replicas lock block Bl in view v. In
the replicas may simply guess a time bound for message arrival or subsequent views, honest replicas that have locked Bl will only vote
use increasing timeouts for each view [11]. for a block that equals or extends Bl unless they unlock. However,
due to flexible quorum intersection, they will not unlock unless
View change. The view change protocol is described in Figure 3. ≥ qc + qr − 1 replicas are faulty. Proof of Lemma 1 formalizes this
If a replica gathers qr ⟨blame, v⟩ messages from distinct replicas, argument.
it forwards them to all other replicas and enters a new view v + 1
(Step (i)). It records the time at which it received the blame certificate 5.3.2 Learners with Synchrony Assumptions (CR2). Intuitively, a
as t-viewchangev . Upon entering a new view, a replica reports to CR2 commit involves qr replicas collectively stating that no “bad
the leader of the new view L ′ its locked block and transitions to the event” happens within “sufficient time” in a view. Here, a bad event
steady state (Step (ii)). qr status messages form the status S. The refers to either leader equivocation or view change (the latter indi-
first block L ′ proposes in the new view should extend the highest cates sufficient replicas believe leader is faulty) and the “sufficient
certified block among these qr status messages. time” is 2∆; where ∆ is a synchrony bound chosen by the learner.
Let L and L ′ be the leaders of views v and v + 1, respectively.

(i) New-view. Upon gathering qr ⟨blame, v⟩ messages, broadcast them and enter view v + 1. Record the time as t-viewchangev .
q
(ii) Status. Suppose B j is the block locked by the replica. Send a status of its locked block to the leader L ′ using ⟨status, v, B j , Cv ′r (B j )⟩
and transition to the steady state. Here, v ′ is the view in which B j was certified.

Figure 3: Flexible BFT view change protocol.

(CR1) Partially-synchronous commit. A block Bk is committed under the partially synchronous rule with parameter qc iff there exist
l ≥ k and v such that
q q
(a) Cv r (Bl ) and Cv r (Bl +1 ) exist where Bl +1 extends Bl and Bk (if l = k, Bl = Bk ).
(b) q Bl ,v ≥ qc and q Bl +1 ,v ≥ qc .

(CR2) Synchronous commit. A block Bk is committed assuming ∆−synchrony iff the following holds for qr replicas. There exist l ≥ k
and v (possibly different across replicas) such that,
q
(a) Cv r (Bl ) exists where Bl extends Bk (if l = k, Bl = Bk ).
(b) An undisturbed-2∆ period is observed after Bl +1 is obtained, i.e., no equivocating block or view change of view v were observed before
2∆ time after Bl +1 was obtained, i.e.,
min(current-time, t-equivl ,v , t-viewchangev ) − t-lockl ,v ≥ 2∆

Figure 4: Flexible BFT commit rules

More formally, a replica states that a synchronous commit for block 5.4 Safety and Liveness
Bk for a given parameter ∆ (set by a learner) is satisfied iff the We introduce the notion of direct and indirect commit to aid the
following holds. There exists Bl +1 that extends Bl and Bk , and the proofs. We say a block is committed directly under CR1 if the block
replica observes an undisturbed-2∆ period after obtaining Bl +1 dur- and its immediate successor both get qc votes in the same view.
ing which (i) no equivocating block is observed, and (ii) no blame We say a block is committed directly under CR2 if some honest
certificate/view change certificate for view v was obtained, i.e., replica reports an undisturbed-2∆ period after its successor block
was obtained. We say a block is committed indirectly if neither
min(current-time, t-equivl ,v , t-viewchangev ) − t-lockl ,v ≥ 2∆ condition applies to it but it is committed as a result of a block
extending it being committed directly. We remark that the direct
commit notion, especially for CR2, is merely a proof technique. A
where t-equivl ,v denotes the time equivocation for Bl in view v learner cannot tell whether a replica is honest, and thus has no way
was observed (∞ if no equivocation), t-viewchangev denotes the of knowing whether a block is directly committed under CR2.
time at which view change happened from view v to v + 1 (∞ if no
view change has happened yet), and t-lockl ,v denotes the time at Lemma 1. If a learner directly commits a block Bl in view v using
which Bl was locked (or Bl +1 was proposed) in view v. Note that a correct commit rule, then a certified block that ranks no lower than
q
the learner does not require the qr fraction of replicas to report the Cv r (Bl ) must equal or extend Bl .
same height l or view v.
q
Safety of CR2. A learner believing in synchrony assumes that all Proof. To elaborate on the lemma, a certified block Cv ′r (Bl′′ )
q
messages between replicas arrive within ∆ time after they were sent. ranks no lower than Cv r (Bl ) if either (i) v ′ = v and l ′ ≥ l, or (ii)
If the learner’s chosen ∆ is a correct upper bound on message delay, v > v. We need to show that if Bl is directly committed, then any

then a CR2 commit is safe against qr faulty replicas (Byzantine plus certified block that ranks no lower either equals or extends Bl . We
a-b-c), as we explain below. If less than qr replicas are faulty, at least consider the two commit rules separately. For both commit rules,
one honest replica reported an undisturbed-2∆ period. Let us call we will use induction on v ′ to prove the lemma.
this honest replica h and analyze the situation from h’s perspective
to explain why an undisturbed 2∆ period ensures safety. Observe
that replicas in Flexible BFT forward the proposal when voting. For CR1 with parameter qc to be correct, flexible quorum inter-
If ∆-synchrony holds, every other honest replica learns about the section needs to hold, i.e., the fraction of faulty replicas must be
proposal Bl at most ∆ time after h learns about it. If any honest less than qc + qr − 1. Bl being directly committed under CR1 with
replica voted for a conflicting block or quit view v, h would have parameter qc implies that there are qc votes in view v for Bl and
known within 2∆ time. Bl +1 where Bl +1 extends Bl .
q
For the base case, a block Bl′′ with l ′ ≥ l that does not extend Bl loss of generality. By Lemma 1, the certified block Cv ′r (Bl′′ ) must
cannot get certified in view v, because that would require qc +qr −1 equal or extend Bl . Thus, Bk′ = Bk . □
replicas to vote for two equivocating blocks in view v.
Next, we show the inductive step. Note that qc replicas voted Theorem 3 (Liveness). If all learners have correct commit rules,
q they all keep committing new blocks.
for Bl +1 in view v, which contains Cv r (Bl ). Thus, they lock Bl or
a block extending Bl by the end of view v. Due to the inductive Proof. By the definition of a-b-c faults, if they cannot violate
hypothesis, any certified block that ranks equally or higher from safety, they will preserve liveness. Theorem 2 shows that if all
view v up to view v ′ either equals or extends Bl . Thus, by the end learners have correct commit rules, then safety is guaranteed even
of view v ′ , those qc replicas still lock Bl or a block extending Bl . if a-b-c replicas behave arbitrarily. Thus, once we proved safety, we
Since the total fraction of faults is less than qc + qr − 1, the status S can treat a-b-c replicas as honest when proving liveness.
shown by the leader of view v ′ + 1 must include a certificate for Bl Observe that a correct commit rule tolerates at most 1 − qr
or a block extending it; moreover, any certificate that ranks equal
q Byzantine faults. If a Byzantine leader prevents liveness, there will
to or higher than Cv r (Bl ) is for a block that equals or extends Bl . be qr blame messages against it, and a view change will ensue to
Thus, only a block that equals or extends Bl can gather votes from replace the leader. Eventually, a non-Byzantine (honest or a-b-c)
those qc replicas in view v ′ + 1 and only a block that equals or replica becomes the leader and drives consensus in new heights.
extends Bl can get certified in view v ′ + 1. If replicas use increasing timeouts, eventually, all non-Byzantine
replicas stay in the same view for sufficiently long. When both
For CR2 with synchrony bound ∆ to be correct, ∆ must be an conditions occur, if a learner’s commit rule is correct (either CR1
upper bound on worst case message delay and the fraction of faulty and CR2), due to quorum availability, it will receive enough votes
replicas is less than qr . Bl being directly committed under CR2 in the same view to commit. □
with ∆-synchrony implies that at least one honest replica voted
for Bl +1 extending Bl in view v, and did not hear an equivocating 5.5 Efficiency
block or view change within 2∆ time after that. Call this replica h.
Suppose h voted for Bl +1 extending Bl in view v at time t, and did Latency. Learners with a synchrony assumption incur a latency
not hear an equivocating block or view change by time t + 2∆. of 2∆ plus a few network speed rounds. In terms of the maximum
We first show the base case: a block Bl′′ with l ′ ≥ l certified in network delay ∆, this matches the state-of-the-art synchronous
view v must equal or extend Bl . Observe that if Bl′′ with l ′ ≥ l does protocols [4]. The distinction though is that ∆ now depends on the
not equal or extend Bl , then it equivocates Bl . No honest replica learner assumption and hence different learners may commit with
voted for Bl′′ before time t + ∆, because otherwise h would have different latencies Learners with partial-synchrony assumptions
received the vote for Bl′′ by time t + 2∆, No honest replica would incur a latency of two rounds of voting; this matches PBFT [11].
vote for Bl′′ after time t + ∆ either, because by then they would have
received (from h) and voted for Bl . Thus, Bl′′ cannot get certified in Communication. Every vote and new-view messages are broad-
view v. cast to all replicas, incurring O(n2 ) communication messages. This
We then show the inductive step. Because h did not hear view is the same as the complexity of PBFT [11] and Sync HotStuff [4].
change by time t + 2∆, all honest replicas are still in view v by Additional storage for replicas. Flexible BFT needs to store some
time t + ∆, which means they all receive Bl +1 from h by the end of additional information compared to existing BFT protocols in order
view v. Thus, they lock Bl or a block extending Bl by the end of to support diverse commit rules. For partially synchronous learners,
view v. Due to the inductive hypothesis, any certified block that replicas need to maintain the number of votes for every block in a
ranks equally or higher from view v up to view v ′ either equals view. For synchronous learners, replicas need to maintain the time
or extends Bl . Thus, by the end of view v ′ , all honest replicas still at which equivocations or view changes occur. These values are
lock Bl or a block extending Bl . Since the total fraction of faults is used to appropriately respond to the learner based on its preferences
less than qr , the status S shown by the leader of view v ′ + 1 must for qc and ∆.
include a certificate for Bl or a block extending it; moreover, any
q
certificate that ranks equal to or higher than Cv r (Bl ) is for a block 6 DISCUSSION
that equals or extends Bl . Thus, only a block that equals or extends
As we have seen, three parameters qr , qc , and ∆ determine the
Bl can gather honest votes in view v ′ + 1 and only a block that
protocol. qr is the only parameter for the replicas and is picked
equals or extends Bl can get certified in view v ′ + 1. □
by the service administrator. The choice of qr determines a set of
Theorem 2 (Safety). Two learners with correct commit rules learner assumptions that can be supported. qc and ∆ are chosen
commit the same block Bk for each height k. by learners to commit blocks. In this section, we first discuss the
learner assumptions supported by a given qr and then discuss the
Proof. Suppose for contradiction that two distinct blocks Bk trade-offs between different choices of qr .
and Bk′ are committed at height k. Suppose Bk is committed as a
result of Bl being directly committed in view v and Bk′ is committed 6.1 Learner Assumptions Supported by qr
as a result of Bl′′ being directly committed in view v ′ . This implies Figure 5 represents the learners supported at qr = 2/3. The x-
Bl is or extends Bk ; similarly, Bl′′ is or extends Bk′ . Without loss of axis represents Byzantine faults and the y-axis represents total
generality, assume v ≤ v ′ . If v = v ′ , further assume l ≤ l ′ without faults (Byzantine plus a-b-c). Each point on this graph represents a
1.0 Sync 1.0 0.80
Partial Sync 0.75
0.67
0.8 0.8 0.60
0.50
Fraction of total faults

Fraction of total faults


0.6 0.6

0.4 0.4

0.2 0.2

0.00.0 0.1 0.2 0.3 0.4 0.5 0.00.0 0.1 0.2 0.3 0.4 0.5
Fraction of Byzantine faults Fraction of Byzantine faults
Figure 5: Learners supported for qr = 2/3. Figure 6: Learners supported by Flexible BFT at different
qr ’s. The legend represents the different qr values.

learner fault assumption as a pair: (Byzantine faults, total faults). Learners with incorrect assumptions and recovery. If a
The shaded gray area indicates an “invalid area” since we cannot learner has an incorrect assumption with respect to the fault thresh-
have fewer total faults than Byzantine faults. A missing dimension old or synchrony parameter ∆, then it can lose safety or liveness.
in this figure is the choice of ∆. Thus, the synchrony guarantee A learner detects a safety violation if it observes (possibly out-of-
shown in this figure is for learners that choose a correct synchrony band) that a conflicting value has been committed, perhaps with a
bound. safer commit rule.
Learners with partial-synchrony assumptions can get fault tol-
erance on (or below) the starred orange line. The right most point For a learner believing in synchrony, if it picks too small a ∆ and
on the line is (1/3, 1/3), i.e., we tolerate less than a third of Byzan- commits a value b, it is possible that a conflicting value b ′ may also
tine replicas and no additional a-b-c replicas. This is the setting be certified. Replicas may choose to extend the branch containing
of existing partially synchronous consensus protocols [11, 14, 36]. b ′ , effectively reverting b and causing a safety violation. If a learner
Flexible BFT generalizes these protocols by giving learners the detects such a safety violation, it may need to revert some of its
option of moving up-left along the line, i.e., tolerating fewer Byzan- commits and increase ∆ to recover.
tine and more total faults. By choosing qc > qr , a learner tolerates For a learner with a partial-synchrony assumption, if it loses
< qc + qr − 1 total faults for safety and ≤ 1 − qc Byzantine faults safety, it can update its fault model to move left along the orange
for liveness. In other words, as a learner moves left, for every addi- starred line, i.e., tolerate higher total faults but fewer Byzantine.
tional vote it requires, it tolerates one fewer Byzantine fault and On the other hand, if it observes no progress as its threshold qc is
gains overall one higher total number of faults (i.e., two more a- not met, then it moves towards the right. However, if the true fault
b-c faults). The left most point on this line (0, 2/3) tolerating no model is in the circled green region in Figure 5, then the learner
Byzantine replicas and the highest fraction of a-b-c replicas. cannot find a partially synchronous commit rule that is both safe
Moreover, for learners who believe in synchrony, if their ∆ as- and live and eventually it has to switch to using a synchronous
sumption is correct, they enjoy 1/3 Byzantine tolerance and 2/3 commit rule.
total tolerance represented by the green diamond. This is because Recall that the goal of a-b-c replicas is to attack safety. Thus,
synchronous commit rules are not parameterized by the number of learners with incorrect assumptions may be exploited by a-b-c repli-
votes received. cas for their own gain (e.g., by double-spending). This is remotely
analogous to Bitcoin – if a learner commits to a transaction when it
How do learners pick their commit rules? In Figure 5, the
is a few blocks deep and a powerful adversary succeeds in creating
shaded starred orange portion of the plot represent fault toler-
an alternative longer fork, the commit is reverted. When a learner
ance provided by the partially synchronous commit rule (CR1).
updates to a correct assumption and recovers from unsafe commits,
Specifically, setting qc to the total fault fraction yields the neces-
their subsequent commits would be safe and final.
sary commit rule. On the other hand, if a learner’s required fault
tolerance lies in the circled green portion of the plot, then the syn-
chronous commit rule (CR2) with an appropriate ∆ picked by the 6.2 Comparing Different qr Choices
learner yields the necessary commit rule. Finally, if a learner’s tar- We now look at the service administrator’s choice at picking qr .
get fault tolerance corresponds to the white region of the plot, then In general, the service administrator’s goal is to tolerate a large
it is not achievable with this qr . number of Byzantine and a-b-c faults, i.e., move towards top and/or
right of the figure. Figure 6 shows the trade-offs in terms of learners 1.0 0.75
supported by different qr values in Flexible BFT. 0.67
0.50
0.8
First, it can be observed that for learners with partial-synchrony
assumptions, qr ≥ 2/3 dominates qr < 2/3. Observe that the

Fraction of total faults


fraction of Byzantine replicas (B) are bounded by B < qc + qr − 1
and B ≤ 1 − qc , so B ≤ qr /2. Thus, as qr decreases, Byzantine 0.6
fault tolerance decreases. Moreover, since the total fault tolerance
is qc + qr − 1, a lower qr also tolerates a smaller fraction of total
faults for a fixed qc . 0.4
For qr ≥ 2/3 or for learners believing in synchrony, no value of
qr is Pareto optimal. For learners with partial-synchrony assump-
tions, as qr increases, the total fault tolerance for safety increases. 0.2
But since qc ≥ qr , we have B ≤ 1 −qr , and hence the Byzantine tol-
erance for liveness decreases. For learners believing in synchrony, 0.0
the total fault tolerance for safety is < qr and the Byzantine fault 0.0 0.1 0.2 0.3 0.4 0.5
tolerance for liveness is ≥ 1 − qr . In both cases, the choice of qr Fraction of Byzantine faults
represents a safety-liveness trade-off. × Partially Synchronous protocols [8, 11, 22, 29, 35, 36]
+ Synchronous Protocols [1, 4, 18, 32]
6.3 Separating Alive-but-corrupt Resilience ▲ Thunderella, Sync HotStuff (△: optimistic) [4, 32]
♦ Zyzzyva, SBFT (♢: optimistic) [22]
from Diversity
So far, we presented the Flexible BFT techniques and protocols to
Figure 7: Comparing Flexible BFT to existing consensus pro-
simultaneously support diverse learner support and stronger a-b-c
tocols. The legend represent different qr values.
fault tolerance. Indeed, we believe both properties are desirable and
they strengthen each other. But we remark that these two properties
can be provided separately.
belief. For the partial synchrony model, learner beliefs form lines,
It is relatively straightforward to provide stronger fault tolerance
and for synchronous settings, learners beliefs are individual circles.
in the a-b-c model in a classic uniform setting. For example, under
partial-synchrony, one can simply use a larger quorum in PBFT The locus of points on a given color represents all learner assump-
(without the qr /q replica/learner quorum separation). But we note tions supported for a corresponding qr , representing the diversity
that a higher total (a-b-c plus Byzantine) tolerance comes at the of learners supported. The figure depicts state-of-art resilience com-
price of a lower Byzantine tolerance. In a uniform setting, this binations by existing consensus solutions via uncolored shapes,
means all learners have to sacrifice some Byzantine tolerance. In the +, ×, △, ▲, ♢, ♦. Partially synchronous protocols [8, 11, 36] that tol-
diverse setting, Flexible BFT gives learners the freedom to choose erate one-third Byzantine faults can all be represented by the ‘+’
the fault assumption they believe in, and a learner can choose the symbol at (1/3, 1/3). Similarly, synchronous protocols [1, 3, 18]
classic Byzantine fault model. that tolerate one-half Byzantine faults are represented by the ‘×’
On the flip side, if one hopes to support diverse learners in the symbol at (1/2, 1/2). It is worth noting that some of these works
employ two commit rules that differ in number of votes or syn-
classic Byzantine fault (no a-b-c faults), the “dimension of diver-
chrony [4, 22, 29, 32]. For instance, Thunderella and Sync HotStuff
sity” reduces. One example is the network speed replica protocol
in Section 3, which supports learners that believe in different syn- optimistically commit in an asynchronous fashion based on quo-
chrony bounds. That protocol can be further extended to support rums of size ≥ 3/4, as represented by a hollow triangle at (1/4, 1/2).
learners with a (uniform) partial-synchrony assumption. Learners Similarly, FaB [29], Zyzzyva [22] and SBFT [17] optimistically com-
with partial-synchrony assumption are uniform since we have not mit when they receive all votes but wait for two rounds of votes
identified any type of “diversity” outside a-b-c faults for them. otherwise. These are represented by two points in the figure. De-
spite the two commit rules, these protocols do not have learner
diversity, all parties involved (replicas and learners) make the same
7 RELATED WORK assumptions and reach the same commit decisions.
Most BFT protocols are designed with a uniform assumption about
the system. The literature on BFT consensus is vast and is largely Diverse learner beliefs. A simple notion of learner diversity ex-
beyond scope for review here; we refer the reader to the standard ists in Bitcoin’s probabilistic commit rule. One learner may consider
textbooks in distributed computing [6, 26]. a transaction committed after six confirmations while another may
require only one confirmation. Generally, the notion of learner di-
Resilience. Figure 7 compares resilience in Flexible BFT with some versity has been discussed informally at public blockchain forums.
existing consensus protocols. The x axis represents a Byzantine re- Another example of diversity is considered in the XFT protocol [25].
silience threshold, the y axis the total resilience against corruption The protocol supports two types of learners: learners that assume
under the a-b-c fault mode. The three different colors (red, green, crash faults under partial synchrony, or learners that assume Byzan-
blue) represent three possible instantiations of Flexible BFT at dif- tine faults but believe in synchrony. Yet another notion of diversity
ferent qr ’s. Each point in the figure represents an abstract “learner” is considered by the federated Byzantine consensus model and the
Stellar protocol [30]. The Stellar protocol allows nodes to pick their alive-but-corrupt fault model did not specify what these replicas
own quorums. Our Flexible BFT approach instead considers diverse would do if they can violate safety for some learners. In particular,
learners in terms of a-b-c adversaries and synchrony. The model they may stop helping liveness. However, we believe this will not
and techniques in [30] and our paper are largely orthogonal and be a concern once we move to a more realistic rational model. In
complementary. that case, the best strategy for alive-but-corrupt replicas is to attack
the safety of learners with unsafe commit rules while preserving
Flexible Paxos. Flexible Paxos by Howard et al. [19] observes that liveness for learners with correct commit rules. Such an analysis
Paxos may use non-intersecting quorums within a view but an inter- in the rational fault model remains interesting future work. Our
section is required across views. Our Flexible Quorum Intersection protocol also assumes that all replicas have clocks that advance at
(b) can be viewed as its counterpart in the Byzantine and a-b-c the same rate. It is interesting to explore whether our protocol can
setting. In addition, Flexible BFT applies the flexible quorum idea be modified to work with clock drifts.
to support diverse learners with different fault model and timing
assumptions. ACKNOWLEDGEMENT
Mixed fault model. Fault models that mix Byzantine and crash We thank Ittai Abraham and Ben Maurer for many useful discus-
faults have been considered in various works, e.g., FaB [29], Up- sions on Flexible BFT. We thank Marcos Aguilera for many insight-
Right [12], and SBFT [4]. These works do not support diverse learn- ful comments on an earlier draft of this work.
ers or stronger resilience. The a-b-c faults are in a sense the opposite
of crash faults, mixing Byzantine with “anti-crashes”. Our a-b-c REFERENCES
adversary bears similarity to a rational adversary in Aiyer et al. [5] [1] Ittai Abraham, Srinivas Devadas, Danny Dolev, Kartik Nayak, and Ling Ren.
2019. Synchronous Byzantine Agreement with Expected O (1) Rounds, Expected
and Groce et al. [16] with several important differences. Aiyer et O (n 2 ) Communication, and Optimal Resilience. In Financial Cryptography and
al. assumes no collusion between rational replicas while Groce et Data Security (FC).
al. assumes collusion but no Byzantine replicas. Aiyer et al. only [2] Ittai Abraham, Danny Dolev, Rica Gonen, and Joe Halpern. 2006. Distributed
computing meets game theory: robust mechanisms for rational secret sharing and
considers partial synchrony while Groce et al. only considers syn- multiparty computation. In Proceedings of the twenty-fifth annual ACM symposium
chrony. In contrast, Flexible BFT considers colluding a-b-c faults on Principles of distributed computing. ACM, 53–62.
and Byzantine faults and supports both partial synchrony and syn- [3] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, and Ling Ren. 2018. Dfinity Con-
sensus, Explored. Cryptology ePrint Archive, Report 2018/1153.
chrony. Aiyer et al. provides a game theoretic proof. More generally, [4] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, and Maofan Yin. 2019.
game theoretical modeling and analysis with collusion have been Sync HotStuff: Simple and Practical State Machine Replication. Cryptology ePrint
Archive, Report 2019/270. https://eprint.iacr.org/2019/270.
performed to other problems such as secret sharing and multiparty [5] Amitanand S Aiyer, Lorenzo Alvisi, Allen Clement, Mike Dahlin, Jean-Philippe
computation [2, 15, 21, 27]. A game theoretic proof for Flexible BFT Martin, and Carl Porth. 2005. BAR fault tolerance for cooperative services. In
remains open. ACM SIGOPS operating systems review, Vol. 39. ACM, 45–58.
[6] Hagit Attiya and Jennifer Welch. 2004. Distributed computing: fundamentals,
simulations, and advanced topics. Vol. 19. John Wiley & Sons.
Mixed timing model. Subsequent to this work, Blum et al. [7] [7] Erica Blum, Jonathan Katz, and Julian Loss. 2019. Synchronous Consensus with
designed a Byzantine Agreement protocol that supports different Optimal Asynchronous Fallback Guarantees. Cryptology ePrint Archive. (2019).
fractions of Byzantine faults under synchrony and asynchrony (in- [8] Ethan Buchman. 2016. Tendermint: Byzantine fault tolerance in the age of
blockchains. Ph.D. Dissertation.
stead of synchrony and partial synchrony as done in our work). [9] Vitalik Buterin and Virgil Griffith. 2017. Casper the Friendly Finality Gadget.
Using the terms described in this paper, their protocol can support CoRR abs/1710.09437 (2017). arXiv:1710.09437 http://arxiv.org/abs/1710.09437
two types of learners: one type believes in synchrony and a larger [10] Christian Cachin, Klaus Kursawe, and Victor Shoup. 2005. Random oracles in Con-
stantinople: Practical asynchronous Byzantine agreement using cryptography.
Byzantine threshold while the other type believes asynchrony and Journal of Cryptology 18, 3 (2005), 219–246.
a smaller Byzantine threshold. However, Blum et al. do not consider [11] Miguel Castro and Barbara Liskov. 1999. Practical Byzantine fault tolerance. In
OSDI, Vol. 99. 173–186.
diversity in terms of supporting different ∆’s, different quorum [12] Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, Mike
sizes, or a-b-c faults. Dahlin, and Taylor Riche. 2009. Upright Cluster Services. In Proceedings of the
ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP ’09). ACM,
New York, NY, USA, 277–290. https://doi.org/10.1145/1629575.1629602
[13] Danny Dolev and H. Raymond Strong. 1983. Authenticated algorithms for
8 CONCLUSION AND FUTURE WORK Byzantine agreement. SIAM J. Comput. 12, 4 (1983), 656–666.
[14] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the
We present Flexible BFT, a protocol that supports diverse learners presence of partial synchrony. J. ACM 35, 2 (1988), 288–323.
with different assumptions to use the same ledger. Flexible BFT [15] S Dov Gordon and Jonathan Katz. 2006. Rational secret sharing, revisited. In
allows the learners to tolerate combined (Byzantine plus alive-but- International Conference on Security and Cryptography for Networks. Springer,
229–241.
corrupt) faults exceeding 1/2 and 1/3 for synchrony and partial [16] Adam Groce, Jonathan Katz, Aishwarya Thiruvengadam, and Vassilis Zikas. 2012.
synchrony respectively. At a technical level, under synchrony, we Byzantine Agreement with a Rational Adversary. In Automata, Languages, and
show a synchronous protocol where the replicas execute a network Programming, Artur Czumaj, Kurt Mehlhorn, Andrew Pitts, and Roger Watten-
hofer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 561–572.
speed protocol and only the commit rule uses the synchrony as- [17] Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas,
sumption. For partial synchrony, we introduce the notion of Flexible Michael K Reiter, Dragos-Adrian Seredinschi, Orr Tamir, and Alin Tomescu. 2019.
SBFT: a scalable decentralized trust infrastructure for blockchains. In DSN.
Byzantine Quorums by deconstructing existing BFT protocols to [18] Timo Hanke, Mahnush Movahedi, and Dominic Williams. 2018. DFINITY Tech-
understand the role played by the different quorums. We combine nology Overview Series, Consensus System. arXiv preprint arXiv:1805.04548
the two to form Flexible BFT which obtains the best of both worlds. (2018).
[19] Heidi Howard, Dahlia Malkhi, and Alexander Spiegelman. 2016. Flexible Paxos:
Our liveness proof in Section 5.4 employs a strong assumption Quorum Intersection Revisited. In OPODIS (LIPIcs), Vol. 70. Schloss Dagstuhl -
that all learners have correct commit rules. This is because our Leibniz-Zentrum fuer Informatik, 25:1–25:14.
[20] Jonathan Katz and Chiu-Yuen Koo. 2009. On expected constant-round protocols [29] J-P Martin and Lorenzo Alvisi. 2006. Fast Byzantine consensus. IEEE Transactions
for byzantine agreement. J. Comput. System Sci. 75, 2 (2009), 91–112. on Dependable and Secure Computing 3, 3 (2006), 202–215.
[21] Gillat Kol and Moni Naor. 2008. Cryptography and game theory: Designing [30] David Mazieres. 2015. The stellar consensus protocol: A federated model for
protocols for exchanging information. In Theory of Cryptography Conference. internet-level consensus.
Springer, 320–339. [31] Silvio Micali. 2016. Algorand: The efficient and democratic ledger.
[22] Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund arXiv:1607.01341.
Wong. 2007. Zyzzyva: speculative byzantine fault tolerance. In ACM SIGOPS [32] Rafael Pass and Elaine Shi. 2018. Thunderella: Blockchains with optimistic instant
Operating Systems Review, Vol. 41. ACM, 45–58. confirmation. In Annual International Conference on the Theory and Applications
[23] Leslie Lamport. 2006. Fast Paxos. Distributed Computing 19, 2 (2006), 79–103. of Cryptographic Techniques. Springer, 3–33.
[24] Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals [33] M. Pease, R. Shostak, and L. Lamport. 1980. Reaching Agreement in the Presence
problem. ACM Transactions on Programming Languages and Systems 4, 3 (1982), of Faults. J. ACM 27, 2 (April 1980), 228–234. https://doi.org/10.1145/322186.
382–401. 322188
[25] Shengyun Liu, Christian Cachin, Vivien Quéma, and Marko Vukolic. 2016. XFT: [34] Fred B Schneider. 1990. Implementing fault-tolerant services using the state
practical fault tolerance beyond crashes. In 12th USENIX Symposium on Operating machine approach: A tutorial. ACM Computing Surveys (CSUR) 22, 4 (1990),
Systems Design and Implementation. USENIX Association, 485–500. 299–319.
[26] Nancy A Lynch. 1996. Distributed algorithms. Elsevier. [35] Jian Yin, Jean-Philippe Martin, Arun Venkataramani, Lorenzo Alvisi, and Mike
[27] Anna Lysyanskaya and Nikos Triandopoulos. 2006. Rationality and adversar- Dahlin. 2003. Separating agreement from execution for byzantine fault tolerant
ial behavior in multi-party computation. In Annual International Cryptology services. ACM SIGOPS Operating Systems Review 37, 5 (2003), 253–267.
Conference. Springer, 180–197. [36] Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abra-
[28] Dahlia Malkhi and Michael Reiter. 1997. Byzantine Quorum Systems. In Proceed- ham. 2019. BFT Consensus with Linearity and Responsiveness. In Proceedings of
ings of the Twenty-ninth Annual ACM Symposium on Theory of Computing (STOC the ACM 38th Symposium on Principles of Distributed Computing (2019).
’97). ACM, New York, NY, USA, 569–578. https://doi.org/10.1145/258533.258650

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy