Flex BFT Ccs19
Flex BFT Ccs19
Commit rules for learners. A value b is said to be committed by a learner assuming ∆-synchrony iff qr replicas each report that there
exists a view v such that,
q
(1) b is certified, i.e., Cv r (b) exists.
(2) the replica observed an undisturbed-2∆ period after certification, i.e., no equivocating value or view change was observed at a time
before 2∆ after it was certified, or more formally, min(current-time, t-equivv , t-viewchangev ) − t-lockv ≥ 2∆
on b. For an honest replica to unlock from b in subsequent underlie existing partially synchronous protocols that tolerate 1/3
views, qr replicas need to claim that they are not locked on Byzantine faults (Section 4.1). We will illustrate that multiple uses
b. At least one of them is honest and would need to falsely of 2/3-quorums actually serve different purposes in these protocols.
claim it is not locked, which cannot happen. We then generalize these protocols to use Flexible Byzantine Quo-
Quorum availability. Byzantine replicas do not exceed 1 − qr so rums (Section 4.2), the key idea that enables more than 1/3 fault
that qr replicas respond to the leader. tolerance and allows diverse learners with varying assumptions to
co-exist.
- R keeps track of the number of votes received for this block in this view as q Bk ,v .
- If block Bk −1 has been proposed in view v, R marks Bk −1 as a locked block and records the locked time as t-lockk −1,v .
- If a block equivocating Bk−1 is proposed by L in view v (possibly received through a vote), R records the time t-equivk −1,v at which
the equivocating block is received.
The replica then enters the next iteration. If the replica observes no progress or equivocating blocks in the same view v, it stops voting
in view v and sends ⟨blame, v⟩r message to all replicas.
by all replicas and the number of votes for a block can be 5.3 Learner Commit Rules
greater than qr . q Bk ,v will be updated each time the replica As mentioned in the introduction, Flexible BFT supports learners
hears about a new vote in view v. with different assumptions. Learners in Flexible BFT learn the state
- Lock time. If Bk−1 was proposed in the same view v, it locks of the protocol from the replicas and based on their own assump-
Bk −1 and records the locked time as t-lockk −1,v . tions determine whether a block has been committed. Broadly, we
- Equivocation time. If the replica ever observes an equiv- supports two types of learners: those who believe in synchrony and
ocating block at height k in view v through a proposal or those who believe in partial synchrony.
vote, it stores the time of equivocation as t-equivk ,v .
5.3.1 Learners with Partial-Synchrony Assumptions (CR1). A
learner with partial-synchrony assumptions deduces whether a
block has been committed by based on the number of votes received
Looking ahead, the locked time t-lockk −1,v and equivocation time by a block. A block Bl (together with its ancestors) is committed
t-equivk −1,v will be used by learners with synchrony assumptions with parameter qc iff Bl and its immediate successor both receive
to commit, and the number of votes q Bk ,v will be used by learners ≥ qc votes in the same view.
with partial-synchrony assumptions to commit. Safety of CR1. A CR1 commit based on qc votes is safe against
Leader monitoring. If a replica detects a lack of progress in view < qc + qr − 1 faulty replicas (Byzantine plus a-b-c). Observe that if
v or observes malicious leader behavior such as more than one Bl gets qc votes in view v, due to flexible quorum intersection, a
height-k blocks in the same view, it blames the leader of view v by conflicting block cannot be certified in view v, unless ≥ qc + qr − 1
broadcasting a ⟨blame, v⟩ message. It quits view v and stops voting replicas are faulty. Moreover, Bl +1 extending Bl has also received
and broadcasting blocks in view v. To determine lack of progress, qc votes in view v. Thus, qc replicas lock block Bl in view v. In
the replicas may simply guess a time bound for message arrival or subsequent views, honest replicas that have locked Bl will only vote
use increasing timeouts for each view [11]. for a block that equals or extends Bl unless they unlock. However,
due to flexible quorum intersection, they will not unlock unless
View change. The view change protocol is described in Figure 3. ≥ qc + qr − 1 replicas are faulty. Proof of Lemma 1 formalizes this
If a replica gathers qr ⟨blame, v⟩ messages from distinct replicas, argument.
it forwards them to all other replicas and enters a new view v + 1
(Step (i)). It records the time at which it received the blame certificate 5.3.2 Learners with Synchrony Assumptions (CR2). Intuitively, a
as t-viewchangev . Upon entering a new view, a replica reports to CR2 commit involves qr replicas collectively stating that no “bad
the leader of the new view L ′ its locked block and transitions to the event” happens within “sufficient time” in a view. Here, a bad event
steady state (Step (ii)). qr status messages form the status S. The refers to either leader equivocation or view change (the latter indi-
first block L ′ proposes in the new view should extend the highest cates sufficient replicas believe leader is faulty) and the “sufficient
certified block among these qr status messages. time” is 2∆; where ∆ is a synchrony bound chosen by the learner.
Let L and L ′ be the leaders of views v and v + 1, respectively.
(i) New-view. Upon gathering qr ⟨blame, v⟩ messages, broadcast them and enter view v + 1. Record the time as t-viewchangev .
q
(ii) Status. Suppose B j is the block locked by the replica. Send a status of its locked block to the leader L ′ using ⟨status, v, B j , Cv ′r (B j )⟩
and transition to the steady state. Here, v ′ is the view in which B j was certified.
(CR1) Partially-synchronous commit. A block Bk is committed under the partially synchronous rule with parameter qc iff there exist
l ≥ k and v such that
q q
(a) Cv r (Bl ) and Cv r (Bl +1 ) exist where Bl +1 extends Bl and Bk (if l = k, Bl = Bk ).
(b) q Bl ,v ≥ qc and q Bl +1 ,v ≥ qc .
(CR2) Synchronous commit. A block Bk is committed assuming ∆−synchrony iff the following holds for qr replicas. There exist l ≥ k
and v (possibly different across replicas) such that,
q
(a) Cv r (Bl ) exists where Bl extends Bk (if l = k, Bl = Bk ).
(b) An undisturbed-2∆ period is observed after Bl +1 is obtained, i.e., no equivocating block or view change of view v were observed before
2∆ time after Bl +1 was obtained, i.e.,
min(current-time, t-equivl ,v , t-viewchangev ) − t-lockl ,v ≥ 2∆
More formally, a replica states that a synchronous commit for block 5.4 Safety and Liveness
Bk for a given parameter ∆ (set by a learner) is satisfied iff the We introduce the notion of direct and indirect commit to aid the
following holds. There exists Bl +1 that extends Bl and Bk , and the proofs. We say a block is committed directly under CR1 if the block
replica observes an undisturbed-2∆ period after obtaining Bl +1 dur- and its immediate successor both get qc votes in the same view.
ing which (i) no equivocating block is observed, and (ii) no blame We say a block is committed directly under CR2 if some honest
certificate/view change certificate for view v was obtained, i.e., replica reports an undisturbed-2∆ period after its successor block
was obtained. We say a block is committed indirectly if neither
min(current-time, t-equivl ,v , t-viewchangev ) − t-lockl ,v ≥ 2∆ condition applies to it but it is committed as a result of a block
extending it being committed directly. We remark that the direct
commit notion, especially for CR2, is merely a proof technique. A
where t-equivl ,v denotes the time equivocation for Bl in view v learner cannot tell whether a replica is honest, and thus has no way
was observed (∞ if no equivocation), t-viewchangev denotes the of knowing whether a block is directly committed under CR2.
time at which view change happened from view v to v + 1 (∞ if no
view change has happened yet), and t-lockl ,v denotes the time at Lemma 1. If a learner directly commits a block Bl in view v using
which Bl was locked (or Bl +1 was proposed) in view v. Note that a correct commit rule, then a certified block that ranks no lower than
q
the learner does not require the qr fraction of replicas to report the Cv r (Bl ) must equal or extend Bl .
same height l or view v.
q
Safety of CR2. A learner believing in synchrony assumes that all Proof. To elaborate on the lemma, a certified block Cv ′r (Bl′′ )
q
messages between replicas arrive within ∆ time after they were sent. ranks no lower than Cv r (Bl ) if either (i) v ′ = v and l ′ ≥ l, or (ii)
If the learner’s chosen ∆ is a correct upper bound on message delay, v > v. We need to show that if Bl is directly committed, then any
′
then a CR2 commit is safe against qr faulty replicas (Byzantine plus certified block that ranks no lower either equals or extends Bl . We
a-b-c), as we explain below. If less than qr replicas are faulty, at least consider the two commit rules separately. For both commit rules,
one honest replica reported an undisturbed-2∆ period. Let us call we will use induction on v ′ to prove the lemma.
this honest replica h and analyze the situation from h’s perspective
to explain why an undisturbed 2∆ period ensures safety. Observe
that replicas in Flexible BFT forward the proposal when voting. For CR1 with parameter qc to be correct, flexible quorum inter-
If ∆-synchrony holds, every other honest replica learns about the section needs to hold, i.e., the fraction of faulty replicas must be
proposal Bl at most ∆ time after h learns about it. If any honest less than qc + qr − 1. Bl being directly committed under CR1 with
replica voted for a conflicting block or quit view v, h would have parameter qc implies that there are qc votes in view v for Bl and
known within 2∆ time. Bl +1 where Bl +1 extends Bl .
q
For the base case, a block Bl′′ with l ′ ≥ l that does not extend Bl loss of generality. By Lemma 1, the certified block Cv ′r (Bl′′ ) must
cannot get certified in view v, because that would require qc +qr −1 equal or extend Bl . Thus, Bk′ = Bk . □
replicas to vote for two equivocating blocks in view v.
Next, we show the inductive step. Note that qc replicas voted Theorem 3 (Liveness). If all learners have correct commit rules,
q they all keep committing new blocks.
for Bl +1 in view v, which contains Cv r (Bl ). Thus, they lock Bl or
a block extending Bl by the end of view v. Due to the inductive Proof. By the definition of a-b-c faults, if they cannot violate
hypothesis, any certified block that ranks equally or higher from safety, they will preserve liveness. Theorem 2 shows that if all
view v up to view v ′ either equals or extends Bl . Thus, by the end learners have correct commit rules, then safety is guaranteed even
of view v ′ , those qc replicas still lock Bl or a block extending Bl . if a-b-c replicas behave arbitrarily. Thus, once we proved safety, we
Since the total fraction of faults is less than qc + qr − 1, the status S can treat a-b-c replicas as honest when proving liveness.
shown by the leader of view v ′ + 1 must include a certificate for Bl Observe that a correct commit rule tolerates at most 1 − qr
or a block extending it; moreover, any certificate that ranks equal
q Byzantine faults. If a Byzantine leader prevents liveness, there will
to or higher than Cv r (Bl ) is for a block that equals or extends Bl . be qr blame messages against it, and a view change will ensue to
Thus, only a block that equals or extends Bl can gather votes from replace the leader. Eventually, a non-Byzantine (honest or a-b-c)
those qc replicas in view v ′ + 1 and only a block that equals or replica becomes the leader and drives consensus in new heights.
extends Bl can get certified in view v ′ + 1. If replicas use increasing timeouts, eventually, all non-Byzantine
replicas stay in the same view for sufficiently long. When both
For CR2 with synchrony bound ∆ to be correct, ∆ must be an conditions occur, if a learner’s commit rule is correct (either CR1
upper bound on worst case message delay and the fraction of faulty and CR2), due to quorum availability, it will receive enough votes
replicas is less than qr . Bl being directly committed under CR2 in the same view to commit. □
with ∆-synchrony implies that at least one honest replica voted
for Bl +1 extending Bl in view v, and did not hear an equivocating 5.5 Efficiency
block or view change within 2∆ time after that. Call this replica h.
Suppose h voted for Bl +1 extending Bl in view v at time t, and did Latency. Learners with a synchrony assumption incur a latency
not hear an equivocating block or view change by time t + 2∆. of 2∆ plus a few network speed rounds. In terms of the maximum
We first show the base case: a block Bl′′ with l ′ ≥ l certified in network delay ∆, this matches the state-of-the-art synchronous
view v must equal or extend Bl . Observe that if Bl′′ with l ′ ≥ l does protocols [4]. The distinction though is that ∆ now depends on the
not equal or extend Bl , then it equivocates Bl . No honest replica learner assumption and hence different learners may commit with
voted for Bl′′ before time t + ∆, because otherwise h would have different latencies Learners with partial-synchrony assumptions
received the vote for Bl′′ by time t + 2∆, No honest replica would incur a latency of two rounds of voting; this matches PBFT [11].
vote for Bl′′ after time t + ∆ either, because by then they would have
received (from h) and voted for Bl . Thus, Bl′′ cannot get certified in Communication. Every vote and new-view messages are broad-
view v. cast to all replicas, incurring O(n2 ) communication messages. This
We then show the inductive step. Because h did not hear view is the same as the complexity of PBFT [11] and Sync HotStuff [4].
change by time t + 2∆, all honest replicas are still in view v by Additional storage for replicas. Flexible BFT needs to store some
time t + ∆, which means they all receive Bl +1 from h by the end of additional information compared to existing BFT protocols in order
view v. Thus, they lock Bl or a block extending Bl by the end of to support diverse commit rules. For partially synchronous learners,
view v. Due to the inductive hypothesis, any certified block that replicas need to maintain the number of votes for every block in a
ranks equally or higher from view v up to view v ′ either equals view. For synchronous learners, replicas need to maintain the time
or extends Bl . Thus, by the end of view v ′ , all honest replicas still at which equivocations or view changes occur. These values are
lock Bl or a block extending Bl . Since the total fraction of faults is used to appropriately respond to the learner based on its preferences
less than qr , the status S shown by the leader of view v ′ + 1 must for qc and ∆.
include a certificate for Bl or a block extending it; moreover, any
q
certificate that ranks equal to or higher than Cv r (Bl ) is for a block 6 DISCUSSION
that equals or extends Bl . Thus, only a block that equals or extends
As we have seen, three parameters qr , qc , and ∆ determine the
Bl can gather honest votes in view v ′ + 1 and only a block that
protocol. qr is the only parameter for the replicas and is picked
equals or extends Bl can get certified in view v ′ + 1. □
by the service administrator. The choice of qr determines a set of
Theorem 2 (Safety). Two learners with correct commit rules learner assumptions that can be supported. qc and ∆ are chosen
commit the same block Bk for each height k. by learners to commit blocks. In this section, we first discuss the
learner assumptions supported by a given qr and then discuss the
Proof. Suppose for contradiction that two distinct blocks Bk trade-offs between different choices of qr .
and Bk′ are committed at height k. Suppose Bk is committed as a
result of Bl being directly committed in view v and Bk′ is committed 6.1 Learner Assumptions Supported by qr
as a result of Bl′′ being directly committed in view v ′ . This implies Figure 5 represents the learners supported at qr = 2/3. The x-
Bl is or extends Bk ; similarly, Bl′′ is or extends Bk′ . Without loss of axis represents Byzantine faults and the y-axis represents total
generality, assume v ≤ v ′ . If v = v ′ , further assume l ≤ l ′ without faults (Byzantine plus a-b-c). Each point on this graph represents a
1.0 Sync 1.0 0.80
Partial Sync 0.75
0.67
0.8 0.8 0.60
0.50
Fraction of total faults
0.4 0.4
0.2 0.2
0.00.0 0.1 0.2 0.3 0.4 0.5 0.00.0 0.1 0.2 0.3 0.4 0.5
Fraction of Byzantine faults Fraction of Byzantine faults
Figure 5: Learners supported for qr = 2/3. Figure 6: Learners supported by Flexible BFT at different
qr ’s. The legend represents the different qr values.
learner fault assumption as a pair: (Byzantine faults, total faults). Learners with incorrect assumptions and recovery. If a
The shaded gray area indicates an “invalid area” since we cannot learner has an incorrect assumption with respect to the fault thresh-
have fewer total faults than Byzantine faults. A missing dimension old or synchrony parameter ∆, then it can lose safety or liveness.
in this figure is the choice of ∆. Thus, the synchrony guarantee A learner detects a safety violation if it observes (possibly out-of-
shown in this figure is for learners that choose a correct synchrony band) that a conflicting value has been committed, perhaps with a
bound. safer commit rule.
Learners with partial-synchrony assumptions can get fault tol-
erance on (or below) the starred orange line. The right most point For a learner believing in synchrony, if it picks too small a ∆ and
on the line is (1/3, 1/3), i.e., we tolerate less than a third of Byzan- commits a value b, it is possible that a conflicting value b ′ may also
tine replicas and no additional a-b-c replicas. This is the setting be certified. Replicas may choose to extend the branch containing
of existing partially synchronous consensus protocols [11, 14, 36]. b ′ , effectively reverting b and causing a safety violation. If a learner
Flexible BFT generalizes these protocols by giving learners the detects such a safety violation, it may need to revert some of its
option of moving up-left along the line, i.e., tolerating fewer Byzan- commits and increase ∆ to recover.
tine and more total faults. By choosing qc > qr , a learner tolerates For a learner with a partial-synchrony assumption, if it loses
< qc + qr − 1 total faults for safety and ≤ 1 − qc Byzantine faults safety, it can update its fault model to move left along the orange
for liveness. In other words, as a learner moves left, for every addi- starred line, i.e., tolerate higher total faults but fewer Byzantine.
tional vote it requires, it tolerates one fewer Byzantine fault and On the other hand, if it observes no progress as its threshold qc is
gains overall one higher total number of faults (i.e., two more a- not met, then it moves towards the right. However, if the true fault
b-c faults). The left most point on this line (0, 2/3) tolerating no model is in the circled green region in Figure 5, then the learner
Byzantine replicas and the highest fraction of a-b-c replicas. cannot find a partially synchronous commit rule that is both safe
Moreover, for learners who believe in synchrony, if their ∆ as- and live and eventually it has to switch to using a synchronous
sumption is correct, they enjoy 1/3 Byzantine tolerance and 2/3 commit rule.
total tolerance represented by the green diamond. This is because Recall that the goal of a-b-c replicas is to attack safety. Thus,
synchronous commit rules are not parameterized by the number of learners with incorrect assumptions may be exploited by a-b-c repli-
votes received. cas for their own gain (e.g., by double-spending). This is remotely
analogous to Bitcoin – if a learner commits to a transaction when it
How do learners pick their commit rules? In Figure 5, the
is a few blocks deep and a powerful adversary succeeds in creating
shaded starred orange portion of the plot represent fault toler-
an alternative longer fork, the commit is reverted. When a learner
ance provided by the partially synchronous commit rule (CR1).
updates to a correct assumption and recovers from unsafe commits,
Specifically, setting qc to the total fault fraction yields the neces-
their subsequent commits would be safe and final.
sary commit rule. On the other hand, if a learner’s required fault
tolerance lies in the circled green portion of the plot, then the syn-
chronous commit rule (CR2) with an appropriate ∆ picked by the 6.2 Comparing Different qr Choices
learner yields the necessary commit rule. Finally, if a learner’s tar- We now look at the service administrator’s choice at picking qr .
get fault tolerance corresponds to the white region of the plot, then In general, the service administrator’s goal is to tolerate a large
it is not achievable with this qr . number of Byzantine and a-b-c faults, i.e., move towards top and/or
right of the figure. Figure 6 shows the trade-offs in terms of learners 1.0 0.75
supported by different qr values in Flexible BFT. 0.67
0.50
0.8
First, it can be observed that for learners with partial-synchrony
assumptions, qr ≥ 2/3 dominates qr < 2/3. Observe that the