CALM
CALM
1 INTRODUCTION simple systems with narrow APIs. Can we avoid coordination more
Nearly all of the software we use today is part of a distributed generally, as Hamilton recommends? When?
system. Apps on your phone participate with hosted services in Surprisingly, this was an open question in distributed systems
the cloud; together they form a distributed system. Hosted services until relatively recently, due to a narrow focus on storage semantics.
themselves are massively distributed systems, often running on We can do better by moving up the stack, setting aside incidental
arXiv:1901.01930v2 [cs.DC] 26 Jan 2019
machines spread across the globe. “Big data” systems and enterprise storage details and considering program semantics more holistically.
databases are distributed across many machines. Most scientific Before we delve into details, we begin with intuition on what is
computing and machine learning systems work in parallel across desirable and what is possible.
multiple processors. Even legacy desktop operating systems and
applications like spreadsheets and word processors are tightly inte- 1.2 Stay in Your Lane: The Perfect Freeway
grated with distributed backend services. As an analogy, consider driving on a highway during rush hour. If
Distributed systems are tricky, so their ubiquity should worry each car would drive forward independently in its lane at the speed
us. Multiple unreliable machines are running in parallel, sending limit, everything would be fine: the capacity of the highway could
messages to each other across network links with arbitrary delays. be fully exploited. Unfortunately, there always seem to be drivers
How can we be confident that our programs do what we want who have other places to go than forward! To prevent two cars
despite this chaos? from being in the same place at the same time, we drivers engage
This problem is urgent, but it is not new. The traditional an- in various forms of coordination when entering traffic, changing
swer has been to reduce this complexity with memory consistency lanes, coming to intersections, etc. We adhere to formal protocols,
guarantees: assurances that the accesses to memory (heap vari- including traffic lights and stop signs. We also frequently engage in
ables, database keys, etc) occur in a controlled fashion. However, ad hoc forms of coordination with neighboring cars by using turn
the mechanisms used to enforce these guarantees—coordination signals, eye contact, and the familar but subtle dance of driving
protocols—are often criticized as barriers to high performance, scale our vehicles more or less aggressively. With all these mechanisms,
and availability of distributed systems. one thing is common: they slow us down when traffic is crowded.
Worse, these slowdowns propagate back to the drivers behind us,
1.1 The High Cost of Coordination and queuing effects amplify the problems. In the end, rush hour on
Coordination protocols enable autonomous, loosely coupled ma- the highway is a nightmare—wildly less efficient than the highway’s
chines to jointly decide how to control basic behaviors, including capacity1 .
the order of access to shared memory. These protocols are among The analogy to distributed systems is fairly direct. In principle,
the most clever and widely cited ideas in distributed computing. each machine or process in a system could proceed forward au-
Some well-known techniques include the Paxos and Two-Phase tonomously with its ordered list of instructions, and make progress
Commit protocols, and global barriers underlying computational as quickly as possible. But to avoid conflicts on shared state (akin
models like Bulk Synchronous Processing. to two cars being in the same place at the same time), distributed
Unfortunately, the expense of coordination protocols can make software employs coordination protocols to stay “safe”. The effect
them “forbidden fruit” for programmers. James Hamilton from of these protocols is to cause one or more processes to idly wait
Amazon Web Services made this point forcefully, using the phrase until some other process successfully sends a signal saying it is
“consistency mechanisms” where we use coordination: done.
In many cases, however, coordination is not a necessary evil, it
The first principle of successful scalability is to
is an incidental requirement of a design decision. To return to our
batter the consistency mechanisms down to a min-
traffic analogy, consider stop lights: they allow drivers to mediate
imum, move them off the critical path, hide them
access to a shared intersection by following a waiting protocol. Stop
in a rarely visited corner of the system, and then
light delays can be easily avoided by taking advantage of another
make it as hard as possible for application devel-
dimension in space: an overpass or tunnel removes the intersection
opers to get permission to use them [27].
entirely. There is no endemic need to employ coordination in two
The issue is not that coordination is tricky to implement, though dimensions via stop lights; they are just one engineering solution
that is true. The main problem is that coordination can dramatically to a problem, with a particular tradeoff between cost of initial
slow down computation, or stop it altogether. Recent work showed implementation and resulting throughput.
that state-of-the-art multiprocessor key-value stores can spend
90% of their time waiting for coordination; a coordination-free
implementation called Anna ran over two orders of magnitude 1 As it happens, humans are not very good at simply driving forward at a fixed speed
faster by eliminating that coordination [47]. Key-value stores are in their lane; but machines are [43]!
Root
O5 O6
T1 T5 T6
O4 O7
T4 T1 O1 O2 O3
T2 T3 T1 T3
1.3 Cruising and Stalling on Graphs In a distributed system, references to objects can span machines.
The Perfect Freeway is an idealistic analogy. We return our atten- A local view of the reference graph contains only a subset of the
tion to examples from distributed computing, to illustrate when edges in the global graph. How can multiple local garbage collectors
we can and cannot achieve the ideal of coordination-freeness. We work together to identify objects that are truly unreachable?
consider two nearly identical classical distributed systems problems Note that a machine may have a local object and no knowledge
involving graph reachability—one coordination-free, one not. whether the object is connected to the root—Machine 3 and object
O 4 in Figure 2 form an example. Yet there still may be a path to
1.3.1 Distributed Deadlock Detection. Distributed databases iden- that object from the root that consists of edges distributed across
tify cycles in a distributed graph in order to detect and remediate other machines. Hence machines should exchange copies of edges
deadlocks. In a traditional database system, a transaction Ti may to accumulate more information about the graph.
be waiting for a lock held by another transaction T j , which may in As before, we might be concerned that there are race condi-
turn be waiting for a second lock held by Ti . The deadlock detector tions here. Can local collectors autonomously declare and deallo-
identifies such “waits-for” cycles by analyzing a directed graph cate garbage? Here, the answer is different: coordination is indeed
in which nodes represent transactions, and edges represent one required! To see this, note that a decision based on incomplete
transaction waiting for another on a lock queue. information—e.g., Machine 3 deciding that object O 4 is unreach-
In a distributed database, a “local” (single-machine) view of the able in Figure 2—can be invalidated by the subsequent arrival of
waits-for graph contains only a subset of the edges in the global new information that demonstrates reachability (e.g., the edges
waits-for graph. In this scenario, how do local deadlock detectors Root → O 1 , O 1 → O 3 , O 3 → O 4 ). The output does not grow
work together to identify global deadlocks? monotonically with the input: previous “answers” may need to
Waits-for cycles may span machines, as in Figure 1. To identify be retracted! To avoid this, a machine must ensure that it has heard
these distributed deadlocks, each machine can exchange copies everything there is to hear before it declares an object unreachable.
of its edges with other machines to accumulate more information The only way to know it has heard everything is to coordinate with
about the global graph. Any time a machine observes a cycle in the all the other machines to establish that fact.
information it has received so far, it can declare a deadlock among
the transactions on that cycle. 1.4 The Crux of Consistency: Monotonicity
We might be concerned that there are “race conditions” in this These examples bring us back to our fundamental question, which
distributed computation. Do local detectors have to coordinate with applies to any concurrent computing framework:
other nodes to be sure of a deadlock they have observed? In this
case, no coordination is required. To see this, note that decisions Question: What is the family of problems that can be consistently
based on incomplete information are stable. For example, once computed in a distributed fashion without coordination, and what
Machine 1 and Machine 2 jointly identify a deadlock between T1 problems lie outside that family?
and T3 , new information from Machine 3 will not change that fact.
Additional facts can only result in additional cycles being detected: There is a difference between an incidental use of coordination
the output grows monotonically with the input. Finally, if all the and an intrinsic need for coordination: the former is the result of an
edges are eventually shared across all machines, the machines will implementation choice; the latter is a property of a computational
agree upon the outcome, which is based on the full graph. problem. Hence our Question is one of computability, like P vs. NP
or Decidability. It asks what is (im)possible for a clever programmer
1.3.2 Distributed Garbage Collection. Garbage collectors in dis- to achieve.
tributed systems must identify unreachable objects in a distributed Note that the question assumes some definition of “consistency”.
graph of memory references. Garbage collection works by identify- Where traditional work focused narrowly on memory consistency
ing graph components that are disconnected from the “root” of a (i.e., reads and writes produce agreed-upon values), we want to fo-
system runtime. cus on program consistency: does the program produce the outcome
2
we expect (e.g., deadlocks detected, garbage collected), despite any This is a question of program confluence. In the context of non-
race conditions that might arise? deterministic message delivery, an operation on a single machine
Our examples provide clues for answering our question. Both is confluent if it produces the same set of outputs for any non-
depend on graph reachability, but they differ in one key aspect. A deterministic ordering and batching of a set of inputs. Following
deadlock is identified by the existence of a (cyclic) path. Garbage our discussion of sets of information S and T above, a confluent
is identified by the non-existence of a path. The set of satisfying single-machine operation can be viewed as a deterministic func-
paths that exist is monotonic in the information received: tion from sets to sets, abstracting away the nondeterministic order
in which its inputs happen to appear in a particular run of a dis-
Definition 1. A program P is monotonic if for any input sets tributed system. Confluent operations compose: if the outputs of
S,T where S ⊆ T , P(S) ⊆ P(T ). one confluent operation are consumed by another, the resulting
By contrast, the set of satisfying paths that do not exist is non- composite operation is confluent. Hence confluence can be applied
monotonic: conclusions made on partial information may not hold to individual operations, components in a dataflow, or even en-
in eventuality. tire distributed programs [2]. If we restrict ourselves to building
Monotonicity is the key property underlying the need for coordi- programs by composing confluent operations, our programs are
nation to establish consistency, as captured in the CALM Theorem: confluent by construction, despite orderings of messages or execu-
tion races within and across components.
Theorem 1. Consistency As Logical Monotonicity (CALM). A Unlike traditional memory consistency properties from the sys-
program has a consistent, coordination-free distributed implementa- tems literature such as linearizability [30] and serializability [21],
tion if and only if it is monotonic. confluence makes no requirements or promises regarding notions
Intuitively, monotonic programs are “safe” in the face of missing of recency (e.g., a read is not guaranteed to return the result of the
information, and can proceed without coordination. Non-monotonic latest write request issued) or ordering of operations (e.g., writes
programs, by contrast, must be concerned that truth of a property are not guaranteed to be applied in the same order at all repli-
could change in the face of new information. Therefore they cannot cas). Nevertheless, if an application is confluent, we know that any
proceed until they know all information has arrived, requiring them such anomalies at the memory or storage level do not affect the
to coordinate. application outcomes.
Additionally, because they “change their mind”, non-monotonic Confluence is a powerful yet permissive correctness criterion for
programs are order-sensitive: the order in which they receive infor- distributed applications. It rules out application-level inconsistency
mation determines how they toggle state back and forth, which in due to races and non-deterministic delivery, while permitting non-
turn determines their final state. By contrast, monotonic programs deterministic ordering and timings of lower-level operations that
simply accumulate beliefs; their output depends only on the content may be costly (or sometimes impossible) to prevent in practice.
of their input, not the order in which is arrives.
Our discussion so far has remained at the level of intuition. The 2.1.1 Confluent Shopping Carts. To illustrate the utility of rea-
next section provides a sketch of a proof of the CALM Theorem, soning about confluence, we consider an example of a higher-level
including further discussion of definitions for consistency and co- application. In their paper on the Dynamo key-value store [20],
ordination. Those seeking a formal proof are directed to the papers researchers from Amazon describe a shopping cart application that
by Ameloot, et al. [8, 9]. achieves confluence without coordination. In their scenario, a client
web browser requests items to add and delete from an online shop-
2 CALM: A PROOF SKETCH ping cart. For availability and performance, the state of the cart is
Our first challenge in formalizing the CALM Theorem is to define tracked by a distributed set of server replicas, which may receive
program consistency in a manner that allows us to reason about requests in different orders. In the Amazon implementation, shop-
program outcomes, rather than mutations to storage. Having done ping performs no coordination, yet all server replicas eventually
that, we can move on to a proof that is more refined than those reach the same final state. The shopping cart is precisely the class
based on traditional memory consistency. of program that interests us: eventually consistent, even when im-
plemented atop a non-deterministic distributed substrate that does
2.1 Program Consistency: Confluence no coordination.
Program consistency is possible in this case because the fun-
Distributed systems introduce significant non-determinism to our
damental operations performed on the cart (e.g., adding items)
programs. Sources of non-determinism include unsynchronized par-
commute, so long as the contents of the cart are represented as a set
allelism, unreliable components, and networks with unpredictable
and the internal ordering of its elements is ignored. If two replicas
delays. As a result, a distributed program can exhibit a large space
disagree about the contents of the cart, their differing views can be
of possible behaviors on a given input.
reconciled simply by taking the union of their respective sets.
While we may not control all the behavior of a distributed pro-
A complication in this context is that deletes are not monotonic
gram, our true concern is with its observable behavior: the program
and seem to cause consistency trouble: if instructions to add item I
outcomes. To this end, we want to assess how distributed non-
and delete item I arrive in different orders at different machines,
determinism affects program outcomes. A practical consistency
the machines may disagree on whether I should be in the cart. As
question is this: “Does my program produce deterministic outcomes
a traditional approach to avoid such “race conditions”, we might
despite non-determinism in the runtime system?”
3
(1) Ingest and apply an unordered batch of requests to insert
and delete records in local relations. Requests may come
from other machines or a distinguished input relation.
(2) Query the (now-updated) local relations to compute batches
of records that should be sent somewhere (possibly locally)
for handling in future.
(3) Send the results of the query phase to relevant machines in
the network as requests to be handled. Results sent locally
are ingested in the very next iteration of the event loop.
Results can also be “sent” to a distinguished output.
The Send phase knows where to send records based on their
data content: the records contain addresses of other machines in
the network. In essence, a programmer in this environment “issues
a request to send a message to machine n” by causing a record
containing the address of n to be Ingested, and writing a Query that
Figure 3: A simple four-machine relational transducer net- will read that record and generate the relevant output for the Send
work with one machine’s state and event loop shown in de- phase2 .
tail. The next challenge is to define monotonicity carefully. In Re-
lational Transducers, “programs expressible in monotonic logic”
are easy to define: they are the transducer networks where every
bracket every non-monotonic delete operation with coordination. machine’s queries are syntactically monotonic relational queries.
Can we do better? For instance, in the relational algebra, we can allow each machine
As a creative application-level use of monotonicity, a common to employ selection, projection, intersection, join and transitive
technique is for deletes to be handled separately from inserts as closure (the monotonic operators of relational algebra), but not
another monotonically growing set of items [20, 42]. The sets of set-difference (the sole non-monotonic operator). If we use rela-
inserted and deleted items are both insert-only, and the insertions tional logic, we disallow the use of universal quantifiers (∀) and
across the two commute. This would seem to solve our problem! their negation-centric equivalent (¬∃)—precisely the construct that
Unfortunately, while additions and deletions commute, neither tripped us up in the garbage collection example of Section 1.3.2
operation commutes with checkout—if a checkout message arrives ("everything there is to hear"). If we model our programs with mu-
before some updates, those updates will be lost. table relations, insertions are allowable, but in general updates and
Even if we stop here, our lens provided a win: monotonicity deletions are not [5, 35]. These informal descriptions elide a num-
allows shopping to be coordination free, even though checkout still ber of clever exceptions to these rules that still achieve semantic
requires coordination. This is the conclusion of the Dynamo design. monotonicity despite syntactic non-monotonicity [8, 18], but they
In later work [18], we go further to make checkout monotonic in give a sense of how the formalism is defined.
this setting as well. : the checkout operation is enhanced with a Now that we have a formal execution model (relational trans-
manifest from the client of all its update message IDs that preceded ducers), a definition of consistency (confluence), and a definition
the checkout message: replicas can delay processing of the checkout of monotonic programs, we are prepared to prove a version of the
message until they have processed all updates in the manifest. CALM Theorem. The forward “if” direction of the CALM Theorem
This design evolution illustrates the theme we seek to clarify. is quite straightforward and similar to our previous discussion: it
Rather than micro-optimize protocols to protect race conditions is easy to show that any monotonic relational transducer in the
in procedural code, modern distributed systems creativity often network will eventually Ingest and Send a deterministic set of mes-
involves minimizing the use of such protocols. sages, and generate a deterministic output.
The reverse “only if” direction is quite a bit trickier, as it requires
2.2 A Sketch of The Proof ruling out any possible scheme for avoiding coordination. The first
challenge is to formally define “coordination” messages, and distin-
The CALM conjecture was presented in a keynote talk at PODS guish them from other forms of message passing that satisfy data
2010 and written up shortly thereafter alongside a number of corol- dependencies needed to compute an output. To do this, Ameloot,
laries [29]. In a subsequent series of papers [8, 9, 48], Ameloot and et al. consider all possible ways to partition data across machines
colleagues presented a formalization and proof of the CALM Theo- in the network at program start. From each of these starting points,
rem which remains the reference formalism at this time. Here we a messaging pattern is produced during execution of the program.
briefly review the structure of the argument from Ameloot, et al. We say that a program contains coordination if it requires messages
To capture the notion of a distributed system composed out of to be sent under all possible partitionings—including partitionings
monotonic (or non-monotonic) logic, Ameloot uses the formalism that co-locate all data at a single machine. Any message that is sent
of a relational transducer [1] running on each machine in a network.
Simply put, a relational transducer is an event-driven server with 2 Thisparadigm has been used in a number of languages for Declarative Networking
a relational backing store and programs written as queries. Each like Overlog and NDlog [37, 38], as well as in the Bloom language for distributed
transducer runs a sequential event loop as follows: programming [3]
4
in every partitioning is a coordination message. As an example, In that frame, CALM asks and answers the underlying question
consider how a distributed garbage collector decides if a locally dis- of CAP: “which programs can be consistenly computed while re-
connected object Oд is garbage. Even if the all the data is placed at a maining available under partition?”. CALM does not contradict
single machine, that machine needs to exchange messages with the CAP. Instead, CALM approaches distributed consistency from a
other machines to check that they have no more additional edges— wider frame of reference:
it needs to “coordinate”, not just communicate data dependencies. (1) First, CAP is a negative result over the space of all programs:
The proof then proceeds to show that non-monotonic operations CALM confirms this coarse result, but delineates at a finer
require this kind of coordination. grain the negative and positive cases. Monotone programs
This brief description elides many interesting aspects of the can in fact satisfy all three of the CAP properties at once;
original paper. In addition to the connections established between non-monotone programs are the ones that cannot.
monotonicity and coordination-freeness, connections are also made (2) The key insight in CALM is to focus on consistency from
between these properties and other distributed systems proper- the viewpoint of program outcomes rather than the tradi-
ties. Of particular note is the issue of distributed agreement on tional histories of storage mutation. The emphasis on the
network membership (represented by Ameloot, et al. as the All program being computed shifts focus from implementation
relation). Network membership is a classic challenge in distributed to specification: it allows us to ask questions about what
systems, and the complicating factor in many classic distributed computations are possible.
protocols. It is shown that the class of monotonic programs is the
same as the class of programs that do not require knowledge of The latter point is what motivated our outcome-oriented defini-
network membership—they do not query All. A similar connection tion of program consistency. Where the CAP Theorem proofs of
is shown with the property of a machine being aware of its own Gilbert and Lynch [24] choose linearizability of updates to storage,
identity/address (querying the Id relation). the CALM Theorem proofs choose confluence of program outcomes.
We note that confluence is both more permissive and closer to user-
observable properties. CALM provides the formal framework for
3 CALM PERSPECTIVE ON THE STATE OF
the widespread intuition that we can indeed “work around CAP”
THE ART in many cases, even if we violate traditional systems-level notions
The CALM theorem describes what is and is not possible. But can of storage consistency.
we use it practically? In this section, we address the implications
of CALM with respect to the state of the art in distributed systems 3.2 Distributed Design Patterns
practice. It turns out that many patterns for maintaining consistency Our shift of focus from mutable storage to program semantics has
follow directly from the theorem. implications beyond proofs. It also informs the design of better
programming paradigms for distributed computing.
3.1 CAP and CALM: Going Positive Traditional programming models the world as a collection of
Brewer’s CAP Theorem [14] informally states that a system can named variables whose values change over time. Bare assignment [10]
exhibit only two out of the three following properties: Consistency, is a nonmonotonic programming construct: outputs based on a pre-
Availability, and Partition-tolerance. CAP is a negative result: it fix of assignments may have to be retracted when new assignments
captures consistency properties that cannot be achieved in general. come in. Similarly, assignments make final program states depen-
But Brewer frames this with constructive advice: dent upon the arrival order of inputs. This makes it extremely hard
[The original] expression of CAP served its pur- to take advantage of the CALM theorem to analyze systems written
pose, which was to open the minds of designers in traditional imperative languages!
to a wider range of systems and tradeoffs ... The Functional programming has long promoted the use of immutable
modern CAP goal should be to maximize combi- variables, which are constrained to take on only a single value dur-
nations of consistency and availability that make ing a computation. Viewed through the lens of CALM, an immutable
sense for the specific application. [14] variable is a simple monotonic pattern: it transitions from being
undefined to its final value, and never goes back. Immutable vari-
CALM is a positive result in this arena: it circumscribes the class of ables generalize to immutable data structures; techniques such as
programs for which all three of the CAP properties can indeed be deforestation [45] make programming with immutable trees, lists
achieved simultaneously. To see this, note the following: and graphs practical.
Observation 1. Coordination-freeness is equivalent to availabil- Monotonic programming patterns are common in the design of
ity under partition. distributed storage systems. We already discussed the Amazon shop-
ping cart for Dynamo, which models cart state as two growing sets.
In the forward direction, a coordination-free program is by defi- A related pattern in storage systems is the use of tombstones: special
nition available under partition: all machines can proceed indepen- data values that mark a data item as deleted. Instead of explicitly
dently. When and if the partition heals, state merger is monotonic allowing deletion (a non-monotonic construct), tombstones masked
and consistent. In the reverse direction, a program that employs immutable values with corresponding immutable tombstone val-
coordination will stall (become unavailable) during coordination ues. Taken together, a data item with tombstone monotonically
protocols if the machines involved in the coordination span the transitions from undefined, to a defined value, and ultimately to
partition. tombstoned.
5
Conflict-free replicated data types (CRDTs) [42] provide an object- of a language-based approach to monotonic programming: local,
oriented framework for monotonic programming patterns like state-centric guarantees can be automatically composed into global,
tombstones, typically for use in the context of replicated state. outcome-oriented, program-level guarantees.
A CRDT is an abstract data type whose internal state is a lattice With Bloom as a base, we have developed tools including declar-
that evolves monotonically according to a partial order, such as the ative testing frameworks [4], verification tools [6], and program
partial order of set containment under ⊆ or of integers under ≤. transformation libraries that add coordination to programs that
Two replicas of a CRDT converge to the same state regardless of the cannot be statically proven to be confluent [2].
order of their inputs. Equally importantly, the states of two CRDT
replicas that may have seen different inputs and orders can always 3.4 Coordination In Its Place
be deterministically merged into a new final state that incorporates Pragmatically, it can be difficult to find a monotonic implementa-
all of the inputs seen by both. tion of a full-featured application. Instead, a good strategy is to keep
CRDTs are an OO lens on a long tradition of prior work that coordination off of the critical path. In the shopping cart example,
exploits commutativity to achieve determinism under concurrency. coordination was limited to checkout, when user performance ex-
This goes back at least to long-running transactions [16, 23], con- pectations are lower. In the garbage collection example (assuming
tinuing through recent work on the linux kernel [17]. The benefits adequate resources) the task can run in the background without
of commutativity have motivated not only abstract data types, but affecting users.
also composable libraries or languages, enabling programmers to It can take creativity to move coordination off of the critical
reason about correctness of whole programs [3, 34, 39]. We turn to path and into a background task. The most telling example from
an example of that idea next. Section 3.2 is the use of tombstoning for low-latency deletion. In
practice, memory for tombstoned items must be reclaimed, so even-
3.3 The Bloom Programming Language tually all machines need to agree to delete some items. Like GC, this
One way to encourage good distributed design patterns is to use a distributed deletion can be coordinated lazily in the background on
language specifically centered around those patterns. Bloom is a a rolling basis. In this case, monotonic design does not stamp out
programming language we designed in that vein. coordination entirely, it moves it off the critical path.
The main goal of Bloom is to make distributed systems easier Another non-obvious use of CALM analysis is to identify when
to reason about and program. We felt that a good language for to compensate (“apologize” [28]) for inconsistency, rather than pre-
a domain is one that obscures irrelevant details and brings into vent it via coordination. For example, when a retail site allows you
sharp focus those that matter. Given that data consistency is a to purchase an item, it should decrement the count of items in
core challenge in distributed computing, we designed Bloom to inventory. This non-monotonic action suggests that coordination
be data-centric: both system state and events are represented as is required, e.g., to ensure that the supply is not depleted before
named data, and computation is expressed as queries over that data. an item is allocated to you. In practice, this requires too much
The programming model of Bloom closely resembles that of the integration between systems for inventory, supply chain, and shop-
relational transducers described in Section 2.23 . From the program- ping. In the absence of such coordination, your purchase may fail
mer’s perspective, Bloom resembles event-driven or actor-oriented non-deterministically after checkout. To account for this possibil-
programming—Bloom programs use reorderable query-like han- ity, additional compensation code must be written to detect the
dler statements to describe how an agent responds to messages out-of-stock exception, and handle it by—for example—sending
(represented as data) by reading and modifying local state and by you an apologetic email with a loyalty coupon. Note that a coupon
sending messages. is not a clear mathematical inverse of any action in the original
Because Bloom programs are written in a relational-style query program; domain-aware compensation often goes beyond typical
language, monotonicity is easy to spot just as it was in relational type system logic.
transducers. The relatively uncommon non-monotonic operations In short, we do not advocate pure monotonic programming as
such as anti-join and set minus stand out in the language’s syntax. the only way to build efficient distributed systems. Monotonic-
In addition, Bloom’s types include CRDT-like lattices that provide ity also has utility as an analysis framework for identifying non-
object-level commutativity, associativity and idempotence. determinism so that programmers can address it creatively.
The advantages of the Bloom design are twofold. First, Bloom
makes set-oriented, monotonic (and hence confluent) program- 4 QUESTIONS
ming the easiest constructs for programmers to work with in the The CALM Theorem provides a “bright line” between problems
language. Contrast this with imperative languages, in which assign- that require coordination and those that do not. In addition to the
ment and explicit sequencing of instructions—two non-monotone constructive directions sketched above, CALM also raises a number
constructs!—are the most natural and familiar building blocks for of questions at the heart of distributed systems theory and pratice.
programs. Second, Bloom can leverage static analysis based on
CALM to certify when programs provide the state-based conver- 4.1 Expressiveness
gence properties provided by CRDTs, and when those properties
are preserved across compositions of modules. This is the power Typically, when we define a family of computations, we expect a
characterization of the expressive power of that family. What is the
3 This is no coincidence: both Bloom and Ameloot’s transducer work are based on a expressive power of the monotone distributed programs from the
relational logic for distributed systems called Dedalus [5]. CALM Theorem?
6
This is a question of descriptive complexity, and one landmark by wrapping them with coordination logic. But the resulting re-
result in that space is the Immerman-Vardi Theorem [31, 44]. In a paired code still contains non-monotonic statements. Can we write
nutshell, Immerman-Vardi states that if you take a suitably defined program checks that will verify the consistency of such code?
class of monotone logic programs (where negation is allowed only One underlying challenge here is that coordination does not
on pre-defined, stored relations) and provide some successor rela- remove non-determinism, it controls non-determinism across the
tion that provides a total order, the resulting language can express system. For example, Paxos is often used to impose an order for
all of PTIME. concurrent events in a distributed system; this ensures uniform
So one natural question is this: can we implement all of PTIME in decisions across machines in one run of the system, but another
a coordination-free manner? Do the conditions of the Immerman- run might produce a different outcome. Hence our definition of
Vardi Theorem align with the conditions of the CALM Theorem? consistency as confluence does not precisely capture the effect of co-
Intuitively, the answer would appear to be “no”. One concern ordination in non-monotonic programs. Declarative constructs like
is that Immerman-Vardi’s requirement for a successor relation is Saccà and Zaniolo’s choice operator [41] may be useful to provide
an unreasonable assumption for a distributed system. Indeed, coor- both a semantics and a syntax for capturing the idea of controlled
dination protocols like Paxos were designed precisely to achieve non-determinism without resorting to operational reasoning.
such a totally ordered sequence in a distributed system. But what As discussed in Section 3.4, sometimes the desired solution to
if we made different, pragmatic assumptions about what can be non-monotonic code is to implement compensation rather than
assumed in a distributed systems: e.g. a successor relation per node, coordination. Again, the repaired code still contains the original
and causal ordering across nodes? How large a complexity class non-monotonic logic, and the program specification is enhanced
could we achieve? The specifics of the definitions of the computing to achieve some notion of acceptable non-determinism: every cus-
model and desired guarantees are critical to the question of what is tomer’s outcome non-deterministically satisfies an exclusive choice
achievable. among acceptable properties. This bears some resemblance to the
The state of the art in this direction is captured by Ameloot previous discussion of choice being made by coordination; it would
and Van den Bussche [7]. For example, if all machines know the be interesting if coordination and compensation could be up-leveled
rules for partitioning data across the system, certain syntactically to a single more general semantic concept of eventual non-deterministic
non-monotone programs can be treated as monotone and run agreement. With such a concept explicitly identified, perhaps it
coordination-free. It would seem plausible that the class of programs could be represented linguistically in such a way that repaired
that can be practically made coordination-free could be expanded programs could be checked for correctness.
even further with other common system assumptions.