Chapter 7-Consistency and Replication
Chapter 7-Consistency and Replication
Introduction
Data are generally replicated to enhance reliability and
improve performance
But replication may create inconsistency
Consistency models for shared data are often hard to
implement in large-scale distributed systems;
Hence simpler models such as client–centric consistency
models are used
2
Objectives of the Chapter
we discuss
why replication is useful and its relation with scalability
consistency models for shared data designed for parallel
computers which are also useful in distributed shared
memory systems
client–centric consistency models
how consistency and replication are implemented
3
7.1 Reasons for Replication
two major reasons: reliability and performance
reliability
if a file is replicated, we can switch to other replicas if
there is a crash on our replica
we can provide better protection against corrupted
data; similar to mirroring in non-distributed systems
performance
if the system has to scale in size and geographical area
place a copy of data in the proximity of the process
using them, reducing the time of access and increasing
its performance; for example a Web server is accessed
by thousands of clients from all over the world
caching is strongly related to replication; normally by
clients
4
Replication as Scaling Technique
replication and caching are widely applied as scaling
techniques
processes can use local copies and limit access time and
traffic
however, we need to keep the copies consistent; but this
may
1. require more network bandwidth
if the copies are refreshed more often than used (low
access-to-update ratio), the cost (bandwidth) is more
expensive than the benefits; not all updates have been
used
5
2. itself be subject to serious scalability problems
intuitively, a read operation made on any copy should
return the same value (the copies are always the same)
thus, when an update operation is performed on one
copy, it should be propagated to all copies before a
subsequent operation takes place
this is sometimes called tight consistency (a write is
performed at all copies in a single atomic operation or
transaction)
difficult to implement since it means that all replicas
first need to reach agreement on when exactly an
update is to be performed locally, say by deciding a
global ordering of operations using Lamport
timestamps and this takes a lot of communication time
6
dilemma
scalability problems can be alleviated by applying
replication and caching, leading to a better performance
but, keeping copies consistent requires global
synchronization, which is generally costly in terms of
performance
solution: loosen the consistency constraints
updates do not need to be executed as atomic operations
(no more instantaneous global synchronization); but
copies may not be always the same everywhere
to what extent the consistency can be loosened depends
on the specific application (the purpose of data as well as
access and update patterns)
7
7.2 Data-Centric Consistency Models
consistency has always been discussed
in terms of read and write operations on shared data
available by means of (distributed) shared memory, a
(distributed) shared database, or a (distributed) file system
we use the broader term data store, which may be physically
distributed across multiple machines
assume also that each process has a local copy of the data
store and write operations are propagated to the other copies
8
the general organization of a logical data store, physically distributed and replicated across multiple
processes
9
a consistency model is a contract between processes and the
data store
processes agree to obey certain rules
then the data store promises to work correctly
ideally, a process that reads a data item expects a value that
shows the results of the last write operation on the data
in a distributed system and in the absence of a global clock
and with several copies, it is difficult to know which is the last
write operation
to simplify the implementation, each consistency model
restricts what read operations return
10
data-centric consistency models to be discussed
1. Sequential Consistency
2. Causal Consistency
3. Entry Consistency
the following notations and assumptions will be used
Wi(x)a means write by Pi to data item x with the value a has been done
Ri(x)b means a read by Pi to data item x returning the value b has been
done
the index may be omitted when there is no confusion as to which
process is accessing data
assume that initially each data item is NIL
11
1.Sequential Consistency
a data store is said to be sequentially consistent when it
satisfies the following condition:
The result of any execution is the same as if the (read and
write) operations by all processes on the data store were
executed in some sequential order and the operations of
each individual process appear in this sequence in the
order specified by its program
i.e., all processes see the same interleaving of operations
time does not play a role; no reference to the “most recent”
write operation
12
example: four processes operating on the same data item x
13
to understand sequential consistency better consider the
following example
assume three concurrently executing processes and three
data items (integers) stored in a sequentially consistent
data store
each variable is assumed to be initialized to 0
15
if we concatenate the outputs of P1, P2 and P3 in that order, we get a 6-
bit signature of the execution
there are a total of 64 = 26 signatures, where 6 is the number of bits of
the signature
not all 64 (26) signatures are valid; for example
000000 is not valid; it means all prints are done before all
assignments; it violates the requirement that statements are
executed in program order
001001 is impossible; 00 means P1 executes before P3 and 01 means
P3 executes before P1
the 90 different valid statement orderings produce a variety of results
(< 64) that are allowed under the assumption of sequential consistency
all processes must accept these as valid results and work correctly,
which is the contract between them and the data store
16
2.Causal Consistency
it is a weakening of sequential consistency
it distinguishes between events that are potentially causally
related and those that are not
example: a write on y that follows a read on x; the writing
of y may have depended on the value of x; e.g., y = x+5
otherwise the two events are concurrent
two processes write two different variables
if event B is caused or influenced by an earlier event, A,
causality requires that everyone else must first see A, then
B
a data store is said to be causally consistent, if it obeys the
following condition:
Writes that are potentially causally related must be seen
by all processes in the same order. Concurrent writes
may be seen in a different order on different machines.
17
example
W2(x)b and W1(x)c are concurrent, not a requirement for
processes to see them in the same order
CR
Conc
19
synchronization variable ownership
each synchronization variable has a current owner, the
process that acquired it last
the owner may enter and exit critical sections
repeatedly without sending messages
other processes must send a message to the current
owner asking for ownership and the current values of
the data associated with that synchronization variable
several processes can also simultaneously own a
synchronization variable, but only for reading
20
3. Entry Consistency
a data store exhibits entry consistency if it meets all the
following conditions:
An acquire access of a synchronization variable is not
allowed to perform with respect to a process until all
updates to the guarded shared data have been performed
with respect to that process. (at an acquire, all remote
changes to the guarded data must be made visible)
Before an exclusive mode access to a synchronization
variable by a process is allowed to perform with respect to
that process, no other process may hold the
synchronization variable, not even in nonexclusive mode.
After an exclusive mode access to a synchronization
variable has been performed, any other process's next
nonexclusive mode access to that synchronization variable
may not be performed until it has performed with respect to
that variable's owner. (it must first fetch the most recent
copies of the guarded shared data) 21
a valid event sequence for entry consistency
22
7.3 Client-Centric Consistency Models
with many applications, updates happen very rarely
for these applications, data-centric models where high
importance is given for updates are not suitable
very weak consistency is generally sufficient for such
systems
Eventual Consistency
there are many applications where few processes (or a
single process) update the data while many read it and
there are no write-write conflicts; we need to handle only
read-write conflicts; e.g., DNS server, Web site
for such applications, it is even acceptable for readers to
see old versions of the data (e.g., cached versions of a
Web page) until the new version is propagated
with eventual consistency, it is only required that updates
are guaranteed to gradually propagate to all replicas
23
data stores that are eventually consistent have the property
that in the absence of updates, all replicas converge toward
identical copies of each other
write-write conflicts are rare and are implemented separately
the problem with eventual consistency is when different
replicas are accessed, e.g., a mobile client accessing a
distributed database may acquire an older version of data
when it uses a new replica as a result of changing location
24
the principle of a mobile user accessing different replicas of a distributed database
27
2.Monotonic Writes
it may be required that write operations propagate in the
correct order to all copies of the data store
in a monotonic-write consistent data store the following
condition holds:
A write operation by a process on a data item x is
completed before any successive write operation on x by
the same process
completing a write operation means that the copy on which
a successive operation is performed reflects the effect of a
previous write operation by the same process, no matter
where that operation was initiated
28
may not be necessary if a later write operation completely
overwrites the present
x = 78;
x = 90;
no need to make sure that x has been first changed to 78
it is important only if part of the state of the data item changes
e.g., a software library, where one or more functions are
replaced, leading to a new version
29
3.Read Your Writes
a data store is said to provide read-your-writes consistency, if
the following condition holds:
The effect of a write operation by a process on data item x
will always be seen by a successive read operation on x by
the same process
i.e., a write operation is always completed before a successive
read operation by the same process, no matter where that
read operation takes place
the absence of read-your-writes consistency is often
experienced when a Web page is modified using an editor and
the modification is not seen on the browser due to caching;
read-your-writes consistency guarantees that the cache is
invalidated when the page is updated
32
7.4 Replica Management
key issues in replication: deciding where, when, and by
whom replicas should be placed and how to keep replicas
consistent
placement refers to two issues: placing of replica servers
(or finding the best locations), and that of placing content
(or finding the best servers)
a. Replica-Server Placement
how to select the best K out of N locations where K < N
two possibilities (there are more)
based on distance between clients and locations where
distance can be measured in terms of latency or
bandwidth
or considering the topology of the Internet as formed by
Autonomous Systems; an AS is a network managed by a
single organization and where all nodes run the same
routing protocol; then place the servers on the K routers
with the larger number of network interfaces
33
b. Content Replication and Placement
three types of replicas:
permanent replicas
server-initiated replicas
client-initiated replicas
the logical organization of different kinds of copies of a data store into three concentric rings
34
i. Permanent Replicas
the initial set of replicas that constitute a distributed
data store; normally a small number of replicas
e.g., a Web site: two forms
the files that constitute a site are replicated across a
limited number of servers on a LAN; a request is
forwarded to one of the servers
mirroring: a Web site is copied to a limited number
of servers, called mirror sites, which are
geographically spread across the Internet; clients
choose one of the mirror sites
35
iii. Client-Initiated Replicas (client caches or simply caches)
to improve access time
a cache is a local storage facility used by a client to
temporarily store a copy of the data it has just received
placed on the same machine as its client or on a
machine shared by clients on a LAN
managing the cache is left entirely to the client; the
data store from which the data have been fetched has
nothing to do with keeping cached data consistent
36
c.Content Distribution
updates are initiated at a client, forwarded to one of the
copies, and propagated to the replicas ensuring consistency
some design issues in propagating updates
state versus operations
pull versus push protocols
unicasting versus multicasting
i. State versus Operations
what is actually to be propagated? three possibilities
send notification of update only (for invalidation protocols
- useful when read/write ratio is small); use of little
bandwidth
transfer the modified data (useful when read/write ratio is
high)
transfer the update operation (also called active
replication); it assumes that each machine knows how to
do the operation; use of little bandwidth, but more
processing power needed from each replica
37
ii. Pull versus Push Protocols
push-based approach (also called server- based protocols):
propagate updates to other replicas without those replicas even
asking for the updates (used when high degree of consistency
is required and there is a high read/write ratio)
pull-based approach (also called client-based protocols): often
used by client caches; a client or a server requests for updates
from the server whenever needed (used when the read/write
ratio is low)
a comparison between push-based and pull-based protocols;
for simplicity assume multiple clients and a single server
39
7.5 Consistency Protocols
so far we have concentrated on various consistency
models and general design issues
consistency protocols describe an implementation of a
specific consistency model
there are three types for data-centric consistency models
primary-based protocols
remote-write protocols
local-write protocols
replicated-write protocols
active replication
quorum-based protocols
cache-coherence protocols
40
1. Primary-Based Protocols (for sequential consistency)
each data item x in the data store has an associated primary,
which is responsible for coordinating write operations on x
two approaches: remote-write protocols, and local-write
protocols
a. Remote-Write Protocols
all read and write operations are forwarded to a fixed single
server
read operations can be carried out locally
such schemes are known as primary-backup protocols
the backup servers are updated each time the primary is
updated
41
the principle of primary-backup protocol
42
may lead to performance problems since it may take time
before the process that initiated the write operation is
allowed to continue - updates are blocking
primary-backup protocols provide straightforward
implementation of sequential consistency; the primary can
order all incoming writes
b.Local-Write Protocols
the primary migrates between processes that wish to perform a
write operation
multiple, successive write operations can be carried out locally,
while (other) reading processes can still access their local copy
such improvement is possible only if a nonblocking protocol is
followed
43
nonblocking
primary-backup protocol in which the primary migrates to the process wanting to perform an update
44
2.Replicated-Write Protocols
unlike primary-based protocols, write operations can be
carried out at multiple replicas; two approaches: Active
Replication and Quorum-Based Protocols
a. Active Replication
each replica has an associated process that carries out
update operations
updates are generally propagated by means of write
operations (the operation is propagated); also possible to
send the update
the operations need to be done in the same order
everywhere; totally-ordered multicast
two possibilities to ensure that the order is followed
Lamport’s timestamps (scalability problem), or
use of a central sequencer that assigns a unique
sequence number for each operation; the operation is
first sent to the sequencer then the sequencer forwards
the operation to all replicas (still scalability problem) 45
b.Quorum-Based Protocols
use of voting: clients are required to request and acquire
the permission of multiple servers before either reading or
writing a replicated data item
e.g., assume a distributed file system where a file is
replicated on N servers
a client must first contact at least half + 1 (majority)
servers and get them to agree to do an update
the new update will be done and the file will be given a
new version number
to read a file, a client must also first contact at least half +
1 and ask them to send version numbers; if all version
numbers agree, this must be the most recent version
a more general approach is to arrange a read quorum (a
collection of any NR servers, or more) for reading and a
write quorum (of at least NW servers) for updating
46
the values of NR and Nw are subject to the following two
constraints
NR + Nw > N ; to prevent read-write conflicts
Nw > N/2 ; to prevent write-write conflicts
48
coherence enforcement strategy: how caches are kept
consistent with the copies stored at the servers
simplest solution: do not allow shared data to be
cached; suffers from performance improvement
allow caching shared data and
let a server send an invalidation to all caches
whenever a data item is modified
or
propagate the update
49
Implementing Client-Centric Consistency
Naive Implementation
each write operation is given a globally unique identifier,
assigned by the server that accepts the operation for the
first time
then for each client, keep track of two sets of identifiers:
the read set consists of the write identifiers relevant for
the read operations performed by a client
the write set consists of the write identifiers performed
by the client
50
monotonic-read consistency is implemented as follows
when a client performs a read operation at a server, the
server is handed the client’s read set to check if all the
identified writes have taken place locally
if not, the server contacts the other servers to ensure that it
is brought up to date before carrying out the read operation
(or the read operation is forwarded to a server where the
write operations took place)
after the read operation, the relevant write operations that
have taken place at the selected servers are added to the
client’s read set
monotonic-write consistency is implemented as follows
when a client initiates a new write operation to a server, the
server is handed the client’s write set
it then ensures that the identified write operations are done
first and in the correct order
after performing the write, that operation’s write identifier is
added to the write set 51
read-your-writes consistency is implemented as follows
it requires that the server where the read operation is
performed has seen all the write operations in the client’s
write set
the writes can be fetched from the other servers before the
read operation is performed (may result with a poor response
time)
alternatively, the client-side software can search for a server
where the identified write operations in the client’s write set
have already been performed
writes-follow-reads consistency is implemented as follows
first bring the selected server up to date with the write
operations in the client’s read set
then add the identifier of the write operation to the write set,
along with the identifiers in the read set (which have now
become relevant for the write operation just performed)
52
problem: in naive implementation, the read and write sets can
become very large
to improve efficiency, read and write operations can be
grouped into sessions, clearing the sets when the session
ends
53