0% found this document useful (0 votes)
8 views39 pages

Unit 2DC

for engineering students -distributed computing

Uploaded by

lokesh.wrnx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views39 pages

Unit 2DC

for engineering students -distributed computing

Uploaded by

lokesh.wrnx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CS3551-DISTRIBUTED COMPUITNG [UNIT II]

UNIT-II LOGICAL TIME AND GLOBAL STATE

Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of Logical
Clocks – Scalar Time – Vector Time; Message Ordering and Group Communication:
Message Ordering Paradigms – Asynchronous Execution with Synchronous Communication –
Synchronous Program Order on Asynchronous System – Group Communication – Causal
Order – Total Order; Global State and Snapshot Recording Algorithms: Introduction –
System Model and Definitions – Snapshot Algorithms for FIFO Channels.

2.1 PHYSICAL CLOCK SYNCHRONIZATION: NTP


Motivation
 In centralized systems:
o There is no need for clock synchronization because, there is only a single
clock. A process gets the time by issuing a system call to the kernel.
o When another process after that get the time, it will get a higher time value.
Thus, there is a clear ordering of events and no ambiguity about events
occurrences.
 In distributed systems:
o There is no global clock or common memory.
o Each processor has its own internal clock and its own notion of time drift
apart by several seconds per day, accumulating significant errors over time.
 For most applications and algorithms that runs in a distributed system requires:
1. The time of the day at which an event happened on a machine in the network.
2. The time interval between two events that happened on different machines in the
network.
3. The relative ordering of events that happened on different machines in the network.
 Example applications that need synchronization are: secure systems, fault diagnosis
and recovery, scheduled operations, database systems.
 Clock synchronization is the process of ensuring that physically distributed processors
have a common notion of time.
 Due to different clocks rates, the clocks at various sites may diverge with time.
 To correct this periodically a clock synchronization is performed. Clocks are
synchronized to an accurate real-time standard like UTC (Universal Coordinated
Time).
 Clocks that are not synchronized with each other will adhere to physical time termed
as physical clocks.
Definitions and terminology
Let Ca and Cb be any two clocks.
1. Time The time of a clock in a machine p is given by the function Cp(t), where Cp(t) = t for
a perfect clock.
2. Frequency Frequency is the rate at which a clock progresses. The frequency at time t of
clock Ca is Ca '(t).
1
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

3. Offset Clock offset is the difference between the time reported by a clock and the real
time. The offset of the clock Ca is given by Ca(t)−t. The offset of clock Ca relative to Cb at
time t ≥ 0 is given by Ca(t)−Cb(t).
4. Skew The skew of a clock is the difference in the frequencies of the clock and the perfect
clock. The skew of a clock Ca relative to clock Cb at time t is Ca'(t)−Cb'(t).
If the skew is bounded by ρ, then as per Eq.(3.1), clock values are allowed to diverge at a rate
in the range of 1−ρ to 1+ρ.
5. Drift (rate) The drift of clock Ca is the second derivative of the clock value with respect to
time, namely, Ca''(t). The drift of clock Ca relative to clock Cb at time t is is Ca''(t)−Cb''(t).
Clock inaccuracies
 Physical clocks are synchronized to an accurate real-time standard like UTC.
However, due to the clock inaccuracy, a timer (clock) is said to be working within its

specification if
where constant ρ is the maximum skew rate.

Network Time Protocol (NTP)


 The Network Time Protocol (NTP) , is widely used for clock synchronization on the
Internet, uses the offset delay estimation method.
 The design of NTP involves a hierarchical tree of time servers.
 The primary server at the root synchronizes with the UTC.
 The next level contains secondary servers, which act as a backup to the primary
server.
 At the lowest level is the synchronization subnet which has the clients.
Clock offset and delay estimation
 This protocol performs several trials and chooses the trial with the minimum delay to
accurately estimate the local time on the target node due to varying message or
network delays between the nodes.
 Let T1,T2,T3,T4 be the values of the four most recent timestamps as shown in the
figure.
 Assume that clocks A and B are stable and running at the same speed. Let ,
a = T1 −T3 and b = T2 −T4.
 If the network delay difference from A to B and from B to A, called differential delay,
is small, the clock offset θ and roundtrip delay of B relative to A at time T4 are
approximately given by the following

 Each NTP message includes the latest three timestamps T1, T2, and T3, while T4
is determined upon arrival.

2
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Figure :The Behavior of fast, slow and perfect clocks with respect to UTC.

3
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

The network time protocol (NTP) synchronization protocol

4
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

2.2 A FRAMEWORK FOR A SYSTEM OF LOGICAL CLOCKS


Definition
 A system of logical clocks consists of a time domain T and a logical clock C.
 Elements of T form a partially ordered set over a relation < called as happened before
or causal precedence.
 The logical clock C is a function that maps an event e to the time domain T, denoted as C(e)
and called the timestamp of e, and is defined as follows:
C: H T
 such that the following monotonicity property is satisfied then it is callled the clock consistency
condition.
for two events ei and ej, ei → ej ⇒ C(ei) < C(ej).
 When T and C satisfy the following condition, the system of clocks is said to be
strongly consistent.
for two events ei and ej , ei→ej ⇔ C(ei) < C(ej ),
Implementing logical clocks
 Implementation of logical clocks requires addressing two issues:
 data structures local to every process to represent logical time and
 a protocol to update the data structures to ensure the consistency condition.
 Each process pi maintains data structures with the following two capabilities:
o A local logical clock, lci, that helps process pi to measure its own progress.
o A logical global clock, gci, represents the process pi’s local view of logical global time. It
allows this process to assign consistent timestamps to its local events.
 The protocol ensures that a process’s logical clock, and thus its view of global time, is
managed consistently.
 The protocol consists of the following two rules:
o R1 This rule governs how the local logical clock is updated by a process when it executes
an event (send, receive, or internal).
o R2 This rule governs how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how it is used by the process to update its view of global
time.
2.3 SCALAR TIME
Definition
 The scalar time representation was proposed by Lamport to totally order events in a distributed
system. Time domain is represented as the set of non-negative integers.
 The logical local clock of a process pi and its local view of global time are squashed into one
integer variable Ci.
 Rules R1 and R2 used to update the clocks is as follows:
R1 : Before executing an event (send, receive, or internal), process pi executes: Ci := Ci + d (d > 0)
 d can have a different value, may be application-dependent. Here d is kept at 1.

5
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

R2 : Each message piggybacks the clock value of its sender at sending time. When a process pi receives
a message with timestamp Cmsg, it executes the following actions:
1. Ci := max(Ci, Cmsg);
2. execute R1;
3. deliver the message.
 Figure 3.1 shows the evolution of scalar time with d=1.

Figure 3.1 Evolution of scalar time


Basic properties Consistency property
 scalar clocks satisfy the monotonicity and consistency property. i.e., for two events ei and ej,
ei →ej ⇒ C(ei) < C(ej ).
Total Ordering
 Scalar clocks can be used to totally order events in a distributed system.
 Problem in totally ordering events: Two or more events at different processes may have an
identical timestamp. i.e., for two events e1 and e2, C(e1) = C(e2) ⇒ e1|| e2.
 In Figure 3.1, 3rd event of process P1 and 2nd event of process P2 have same scalar timestamp.
Thus, a tie-breaking mechanism is needed to order such events.
 A tie among events with identical scalar timestamp is broken on the basis of their process
identifiers. The lower the process identifier then it is higher in priority.
 The timestamp of an event is a tuple (t, i) where t - time of occurrence and i - identity of the
process where it occurred. The total order relation ≺ on two events x and y with timestamps
(h,i) and (k,j) is:
x ≺ y⇔ (h < k or (h = k and i < j))
Identitical Scalar Timestamp

Event counting
 If the increment value of d is 1 then, if the event e has a timestamp h, then h−1 represents
minimum number of events that happened before producing the event e;

6
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 In the figure,five events precede event b on the longest causal path ending at b.
Figure: Five events precede event b on the longest causal path ending at b

No strong consistency
 The system of scalar clocks is not strongly consistent; that is, for two events ei and ej, C(ei) <
C(ej) ≠> ei→ej .
 In Figure , the 3rd event of process P1 has smaller scalar timestamp than 3rd event of process P2.

2.4 VECTOR TIME DEFINITION


 The system of vector clocks was developed Fidge, Mattern, and Schmuck.
 Here, the time domain is represented by a set of n-dimensional non-negative integer vectors.
 Each process pi maintains a vector vti[1..n], where vti[i] is the local logical clock of pi that
specifies the progress at process.
 vti[j] represents process pi’s latest knowledge of process pj local time.
 If vti[j] = x, then process pi knowledge on process pj till progressed x.
 The entire vector vti constitutes pi’s view of global logical time and is used to timestamp
events.
 Process pi uses the following two rules R1 and R2 to update its clock:
R1:
 Before executing an event, process pi updates its local logical time as follows:
vti[i] = vti[i] + d (d>0)
R2:
 Each message m is piggybacked with the vector clock vt of the sender process at sending time.
 On receipt of message (m,vt), process pi executes:
1. update its global logical time as follows:
• 1 ≤ k ≤n : vti[k] := max(vti[k], vt[k])
2. execute R1;

7
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

3. deliver the message m.


 The timestamp associated with an event is the value of vector clock of its process when the
event is executed.
 The vector clocks progress with the increment value d = 1. Initially, it is [0, 0, 0, .. , 0].
 The following relations are defined to compare two vector timestamps,
o vh and vk: vh = vk ⇔∀ x : vh[x] = vk[x]
o vh ≤ vk ⇔∀ x : vh[x] ≤ vk[x]
o vh < vk ⇔vh ≤ vk and ∃ x : vh[x] < vk[x] vh || vk ⇔¬(vh < vk)𝖠 ¬(vk < vh)
Basic properties Isomorphism
 The relation “→” denotes partial order on the set of events in a distributed execution.
 If events are timestamped using vector clocks, then
 If two events x and y have timestamps vh and vk, respectively, then x→y ⇔vh < vk
x || y ⇔vh || vk
 Thus, there is an isomorphism between the set of partially ordered events and their
vector timestamps.
 Hence, to compare two timestamps consider the events x and y occurred at processes pi
and pj are assigned timestamps vh and vk, respectively, then
x→y ⇔vh[i] ≤ vk[i]
x || y ⇔vh[i] > vk[i] 𝖠 vh[j] < vk[j]
Strong consistency
 The system of vector clocks is strongly consistent;
 Hence, by examining the vector timestamp of two events, it can be determined that
whether the events are causally related.
Event counting
 If d is always 1 in rule R1, then the ith component of vector clock at process pi, vti[i],
denotes the number of events that have occurred at pi until that instant.
 so, if an event e has timestamp vh, vh[j] denotes the number of events executed by
process pj that causally precede e.
 Ʃvh[j]−1 represents the total number of events that causally precede e in the distributed
computation.
Applications
 As vector time tracks causal dependencies exactly, it's applications are as follows:
 distributed debugging,
 Implementations of causal ordering communication in distributed shared memory.
 Establishment of global breakpoints to determine consistency of checkpoints in
recovery.

Linear Extension
 A linear extension of a partial order (E, ) is a linear ordering of E i.e., consistent with
partial order, if two events are ordered in the partial order, they are also ordered in the

8
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

linear order. It is viewed as projecting all the events from the different processes on a
single time axis.
Dimension
 The dimension of a partial order is the minimum number of linear extensions whose
intersection gives exactly the partial order.
Example:
• The timestamp of an event is the value of the vector clock of its process when the event is executed.
• Figure shows an example of vector clocks progress with the increment value d=1
• Initially, a vector clock is [0, 0, 0, ...., 0].

Vector time calculation

2.5 MESSAGE ORDERING AND GROUP COMMUNICATION


 For any two events a and b, where each can be either a send or a receive event, the
notation
 a ~ b denotes that a and b occur at the same process, i.e., a ∈ Ei and b ∈ Ei for some
process i. The send and receive event pair for a message called pair of
corresponding events.
 For a given execution E, let the set of all send–receive event pairs be denoted as
 T = {(s,r) ∈ Ei × Ej | s corresponds to r}.

9
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

2.6 MESSAGE ORDERING PARADIGMS


 The order of delivery of messages in a distributed system is an important
 Aspect of system executions because it determines the messaging behavior that can be expected
by the distributed program.
 Distributed program logic greatly depends on the order of delivery of messages.
 Several orderings on messages have been defined:
(i) non-FIFO,
(ii) FIFO,
(iii) causalorder, and
(iv) synchronous order.
Asynchronous executions

An asynchronous execution (or A-execution) is an execution (E, ≺) for which the causality
relation is a partial order.

 There cannot be any causal relationship between events in asynchronous execution.


 The messages can be delivered in any order even in non FIFO.
 Though there is a physical link that delivers the messages sent on it in FIFO order due

to the physical properties of the medium, a logical link may be formed as a composite
of physical links and multiple paths may exist between the two end points of the logical
link.
(6.1 Illustrating FIFO and non-FIFO executions. (a) An A-execution that is not a
FIFO execution. (b) An A-execution that is also a FIFO execution.)
FIFO executions

A FIFO execution is an A-execution in which, for all

 The logical link is non-FIFO.


 FIFO logical channels can be realistically assumed when designing distributed
algorithms since most of the transport layer protocols follow connection oriented
service.

10
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 A FIFO logical channel can be created over a non-FIFO channel by using a separate
numbering scheme to sequence the messages on each logical channel.
 The sender assigns and appends a <sequence_num, connection_id> tuple to each
message.
 The receiver uses a buffer to order the incoming messages as per the sender’s
sequence numbers, and accepts only the “next” message in sequence.

Causally ordered (CO) executions

CO execution is an A-execution in which, for all,

 Two send events s and s’ are related by causality ordering (not physical time ordering),
then a causally ordered execution requires that their corresponding receive events r and
r’ occur in the same order at all common destinations.
 If s and s’ are not related by causality, then CO is vacuously(blankly)satisfied.
 Causal order is used in applications that update shared data, distributed shared memory,
or fair resource allocation.

 The delayed message m is then given to the application for processing. The event of an
application processing an arrived message is referred to as a delivery event.
 No message overtaken by a chain of messages between the same (sender, receiver)
pair.

(Fig:CO executions)
If send(m1) ≺ send(m2) then for each common destination d of messages m1 and m2,
deliverd(m1) ≺deliverd(m2) must be satisfied.

Other properties of causal ordering


1. Message Order (MO): A MO execution is an A-execution in which, for all

.
2. Empty Interval Execution: An execution (E ≺) is an empty-interval (EI)
execution if for each pair of events (s, r) ∈ T, the open interval set

11
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

in the partial order is empty.


3. An execution (E, ≺) is CO if and only if for each pair of events (s, r) ∈ T and each
event e ∈ E,
 weak common past:

 weak common future:

Synchronous execution (SYNC)


 When all the communication between pairs of processes uses synchronous send and
receives primitives, the resulting order is the synchronous order.
 The synchronous communication always involves a handshake between the receiver
and the sender, the handshake events may appear to be occurring instantaneously and
atomically.
 The instantaneous communication property of synchronous executions requires a
modified definition of the causality relation because for each (s, r) ∈ T, the send event
is not causally ordered before the receive event.
 The two events are viewed as being atomic and simultaneous, and neither event
precedes the other.

Fig a) Execution in an asynchronous Fig b) Equivalent synchronous


system communication

12
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Causality in a synchronous execution: The synchronous causality relation << on E is


the smallest transitive relation that satisfies the following:
S1: If x occurs before y at the same process, then x << y.

S2: If (s, r ∈ T, then) for all x ∈ E, [(x<< s ⇐⇒ x<<r) and (s<< x ⇐⇒ r<< x)].S3:

If x<<y and y<<z, then x<<z.

Synchronous execution: A synchronous execution or S-execution is an execution (E,


<<) for which the causality relation << is a partial order.

Timestamping a synchronous execution: An execution (E, ≺) is synchronous if and


only if there exists a mapping from E to T (scalar timestamps) such that
 for any message M, T(s(M)) = T(r(M))

 for each process Pi , if ei≺ei’, then T(ei) < T(ei’).

2.7 ASYNCHRONOUS EXECUTION WITH SYNCHRONOUS COMMUNICATION


 When all the communication between pairs of processes is by using synchronous send and
receive primitives, the resulting order is synchronous order.
 A distributed program that run correctly on an asynchronous system may not be
executed by synchronous primitives. There is a possibility that the program may
deadlock, as shown by the code in Figure 6.4.
Figure 6.4 A communication program for an asynchronous system deadlocks when using synchronous
primitives.

Realizable Synchronous Communication (RSC)

A-execution can be realized under synchronous communication is called a realizable


with synchronous communication (RSC).

Examples: In Figure 6.5(a-c) using a timing diagram, will deadlock if run with

13
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

synchronous primitives.

Figure 6.5 Illustrations of asynchronous executions and of crowns. (a) Crown of size 2. (b)
Another crown of size 2. (c) Crown of size 3.

 An execution can be modeled to give a total order that extends the partial order (E,
≺).

 In an A-execution, the messages can be made to appear instantaneous if there exist a


linear extension of the execution, such that each send event is immediately followed by its
corresponding receive event in this linear extension.

Non-separated linear extension is an extension of (E, ≺) is a linear extension of (E, ≺)


such that for each pair (s, r) ∈ T, the interval { x ∈ E s ≺ x ≺ r } is empty.

A A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated


linear extension of the partial order (E, ≺).

 In the non-separated linear extension, if the adjacent send event and its corresponding
receive event are viewed atomically, then that pair of events shares a common past and a
common future with each other.

Crown

Let E be an execution. A crown of size k in E is a sequence <(si, ri), i ∈{0,…, k-1}> of


pairs of corresponding send and receive events such that: s0 ≺ r1, s1 ≺ r2, sk−2 ≺ rk−1,
sk−1 ≺ r0.

The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies may
exist in a crown. The crown criterion states that an A-computation is RSC, i.e., it can be realized
on a system with synchronous communication, if and only if it contains no crown.

Timestamp criterion for RSC execution

An execution (E, ≺) is RSC if and only if there exists a mapping from E to T(scalar
timestamps) such that

14
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 For any message M, T(S(M)) = T(r(M));


For each (a, b) in (E x E) \ T, a->b <=>T(a) < T(b)

2.8 SYNCHRONOUS PROGRAM ORDER ON AN ASYNCHRONOUS SYSTEM


There do not exist real systems with instantaneous communication that allows for synchronous
communication to be naturally realized.

Non deterministic programs


The partial ordering of messages in the distributed systems makes the repeated runs of
the same program will produce the same partial order, thus preserving deterministic nature.
But sometimes the distributed systems exhibit non determinism:

 A receive call can receive a message from any sender who has sent a message, if the
expected sender is not specified.

 Multiple send and receive calls which are enabled at a process can be executed in
an interchangeable order.

 If i sends to j, and j sends to i concurrently using blocking synchronous calls, there


results a deadlock.

 There is no semantic dependency between the send and the immediately following
receive at each of the processes. If the receive call at one of the processes can be
scheduled before the send call, then there is no deadlock.

Rendezvous

Rendezvous systems are a form of synchronous communication among an arbitrary


number of asynchronous processes. All the processes involved meet with each other, i.e.,
communicate synchronously with each other at one time. Two types of rendezvous systems are
possible:

 Binary rendezvous: When two processes agree to synchronize.

 Multi-way rendezvous: When more than two processes agree to synchronize.

Features of binary rendezvous:

 For the receive command, the sender must be specified. However, multiple receive
commands can exist. A type check on the data is implicitly performed.

 Send and received commands may be individually disabled or enabled. A command


is disabled if it is guarded and the guard evaluates to false. The guard

would likely contain an expression on some local variables.

15
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 Synchronous communication is implemented by scheduling messages under the


covers using asynchronous communication.

 Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not
alter the partial order of the execution.

Binary rendezvous algorithm

If multiple interactions are enabled, a process chooses one of them and tries to
synchronize with the partner process. The problem reduces to one of scheduling messages
satisfying the following constraints:

 Schedule on-line, atomically, and in a distributed manner.

 Schedule in a deadlock-free manner (i.e., crown-free).

 Schedule to satisfy the progress property in addition to the safety property.

Steps in Bagrodia algorithm

1. Receive commands are forever enabled from all processes.

2. A send command, once enabled, remains enabled until it completes, i.e., it is not
possible that a send command gets before the send is executed.

3. To prevent deadlock, process identifiers are used to introduce asymmetry to break


potential crowns that arise.

4. Each process attempts to schedule only one send event at any time.

The message (M) types used are: M, ack(M), request(M), and permission(M). Execution
events in the synchronous execution are only the send of the message M and receive of the
message M. The send and receive events for the other message types – ack(M), request(M), and
permission(M) which are control messages. The messages request(M), ack(M), and
permission(M) use M’s unique tag; the message M is not included in these messages.
(Message types) -M, ack(M), request(M), permission(M)

(1) Pi wants to execute SEND(M) to a lower priority process Pj:

Pi execute send(M) and blocks until it receives ack(M) from Pj. The send event
SEND(M) now completes.

Any M’ message (from a higher priority processes) and request(M’) request for

16
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

synchronization (from a lower priority processes) received during the blocking period
arequeued.

(2) Pi wants to execute SEND(M) to a higher priority process Pj:

(2a) Pi seeks permission from Pj by executing send(request(M))

// to avoid deadlock in which cyclically blocked processes queue messages.


(2b) while Pi is waiting for permission, it remains unblocked.
(i) If a message M’ arrives from a higher priority process Pk, Pi accepts M’
by scheduling a RECEIVER(M’) event and then executes
send(ack(M’)) to Pk.

(ii) If a request(M’) arrives from a lower priority process Pk, Pi executes


send(permission(M’)) to Pk and blocks waiting for the message M’.
when M’ arrives, the RECEIVER (M’) event is executed.

(2c) when the permission (M) arrives Pi knows partner Pj is synchronized and Pi
executes send(M). The SEND(M) now completes.

(3) request(M) arrival at Pi from a higher priority process Pj:

At the time a request(M) is processed by pi process pi executes send(permission(M))


toPj and blocks waiting for the message M. when M arrives the RECEIVE(M) event
isexecuted and the process unblocks.

(4) Message M arrival at Pi from a higher priority process Pj:

At the time a message M is processed by Pi, proess Pi executed RECEIVE(M)


(which is assumed to be always enabled) and then send(ack(M)) to Pj.

(5) Processing when Pi is unblocked:

When Pi is unblocked, it dequeues the next (if any) message from the queue and
processes it as a message arrival (as per rule 3 or 4).

Fig : Bagrodia Algorithm

 The algorithm illustrates how crown-free message scheduling is achieved on-line.

17
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Messages used to implement synchronous order. Pi has higher priority than Pj . (a) Pi
issues SEND(M).
(b) Pj issues SEND(M).

 The message types used are:


(i) M – Message is the one i.e., exchanged between any two process during execution
(ii) ack(M) – acknowledgment for the received message M ,
(iii) request(M) – when low priority process wants to send a message M to the
highpriority process it issues this command.
(iv) permission(M) – response to the request(M) to low priority process from the
highpriority process.

(Examples showing how to schedule messages sent with synchronous primitives)

Code shown is for process Pi , 1 ≤ i ≤ n.

2.9 GROUP COMMUNICATION


Group communication is done by broadcasting of messages. A message broadcast is
the sending of a message to all members in the distributed system. The communication may
be
 Multicast: A message is sent to a certain subset or a group.
 Unicasting: A point-to-point message communication.

The network layer protocol cannot provide the following functionalities:


 Application-specific ordering semantics on the order of delivery of messages.
 Adapting groups to dynamically changing membership.
 Sending multicasts to an arbitrary set of processes at each send event.
 Providing various fault-tolerance semantics.
 The multicast algorithms can be open or closed group.

18
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Differences between closed and open group algorithms:


Closed group algorithms Open group algorithms
If sender is also one of the receiver in the If sender is not a part of the
multicast algorithm, then it is closed communication group, then it is open
group algorithm. group algorithm.
They are specific and easy to implement. They are more general, difficult to design and
expensive.
It does not support large systems where client It can support large systems.
processes have short life.

2.10 CAUSAL ORDER (CO)


In the context of group communication, there are two modes of communication:
causal order and total order. Given a system with FIFO channels, causal order needs to be
explicitly enforced by a protocol. The following two criteria must be met by a causal
ordering protocol:
 Safety: In order to prevent causal order from being violated, a message M that
arrives at a process may need to be buffered until all system wide messages sent in the
causal past of the send (M) event to that same destination have already arrived. The
arrival of a message is transparent to the application process. The delivery event
corresponds to the receive event in the execution model.
 Liveness: A message that arrives at a process must eventually be delivered to the
process.
The Raynal–Schiper–Toueg algorithm
 Each message M should carry a log of all other messages sent causally before M’s
send event, and sent to the same destination dest(M).
 The Raynal–Schiper–Toueg algorithm canonical algorithm is a representative of
several algorithms that reduces the size of the local space and message space
overhead by various techniques.
 This log can then be examined to ensure whether it is safe to deliver a message.
 All algorithms aim to reduce this log overhead, and the space and time overhead of
maintaining the log information at the processes.
The Raynal–Schiper–Toueg algorithm

19
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Complexity:
This algorithm takes O(n 2 )
The Kshemkalyani –Singhal optimal algorithm
An optimal CO algorithm stores in local message logs and propagates on messages,
information of the form d is a destination of M about a message M sent in the causal past, as
long as and only as long as:

Propagation Constraint I: it is not known that the message M is delivered to d.

Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send(M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.

Fig 2.6: Conditions for causal ordering

The Propagation Constraints also imply that if either (I) or (II) is false, the information
“d ∈ M.Dests” must not be stored or propagated, even to remember that (I) or (II) has been
falsified:
 not in the causal future of Deliverd(M1, a)
 not in the causal future of e k, c where d ∈Mk,cDests and there is no

20
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

othermessage sent causally between Mi,a and Mk, c to the same


destination d.

Information about messages:


(i) not known to be delivered
(ii) not guaranteed to be delivered in CO, is explicitly tracked by the algorithm using (source,
timestamp, destination) information.
The algorithm for the send and receive operations is given in Fig. 2.7 a) and b).Procedure
SND is executed atomically. Procedure RCV is executed atomically except for a possible

interruption in line 2a where a non-blocking wait is required to meet the Delivery Condition.

Fig 2.7 a) Send algorithm by Kshemkalyani–Singhal to optimally implement causal


ordering

21
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Fig 2.7 b) Receive algorithm by Kshemkalyani–Singhal to optimally implement causal


ordering

The data structures maintained are sorted row–major and then column–major:

1. Explicit tracking:
 Tracking of (source, timestamp, destination) information for messages (i) not known to be
delivered and (ii) not guaranteed to be delivered in CO, is done explicitly using the
I.Destsfield of entries inlocal logs at nodes and o.Dests field of entries in messages.
 Sets li,aDestsand oi,a. Dests contain explicit information of destinations to which Mi,ais
not guaranteed to be delivered in CO and is not known to be delivered.
 The information about d ∈Mi,a .Destsis propagated up to the earliestevents on all causal
paths from (i, a) at which it is known that Mi,a isdelivered to d or is guaranteed to be
delivered to d in CO.

2. Implicit tracking:
 Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be
delivered in CO, is performed implicitly.

22
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 The information about messages (i) already delivered or (ii) guaranteed tobe delivered
in CO is deleted and not propagated because it is redundantas far as enforcing CO is
concerned.
 It is useful in determiningwhat information that is being carried in other messages and
is being storedin logs at other nodes has become redundant and thus can be purged.
 Thesemantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked
without explicitly storing it.
 The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining
only oi,aDests or li,aDests, which is a part of the explicit information.

Fig 2.8: Illustration of propagation constraints


Multicasts M5,1and M4,1
Message M5,1 sent to processes P4 and P6 contains the piggybacked information M5,1.
Dest= {P4, P6}. Additionally, at the send event (5, 1), the information M5,1.Dests = {P4,P6}
is also inserted in the local log Log5. When M5,1 is delivered to P6, the (new) piggybacked
information P4 ∈ M5,1 .Dests is stored in Log6 as M5,1.Dests ={P4} information about P6 ∈
M5,1.Dests which was needed for routing, must not be stored in Log6 because of constraint
I.
In the same way when M5,1 is delivered to process P4
at event (4, 1), only the new piggybacked information P6 ∈ M5,1 .Dests is inserted in Log4
as M5,1.Dests =P6which is later propagated duringmulticast M4,2.

Multicast M4,3
At event (4, 3), the information P6 ∈M5,1.Dests in Log4 is propagated onmulticast M4,3only
to process P6 to ensure causal delivery using the DeliveryCondition. The piggybacked
information on message M4,3sent to process P3must not contain this information because of
constraint II. As long as any future message sent to P6 is delivered in causal order w.r.t.
M4,3sent to P6, it will also be delivered in causal order w.r.t. M5,1. And as M5,1 is already

23
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

delivered to P4, the information M5,1Dests = ∅ is piggybacked on M4,3 sent to P 3.


Similarly, the information P6 ∈ M5,1Dests must be deleted from Log4 as it will no longer be
needed, because of constraint II. M5,1Dests = ∅ is stored in Log4 to remember that M5,1 has
been delivered or is guaranteed to be delivered in causal order to all its destinations.
Learning implicit information at P2 and P3
When message M4,2is received by processes P2 and P3, they insert the (new)
piggybackedinformation in their local logs, as information M5,1.Dests = P6. They both
continue to storethis in Log2 and Log3 and propagate this information on multicasts until
they learn at events(2, 4) and (3, 2) on receipt of messages M3,3and M4,3, respectively, that
any future message is expected to be delivered in causal order to process P6, w.r.t. M5,1sent
toP6. Hence byconstraint II, this information must be deleted from Log2 andLog3. The
flow of events isgiven by;
 When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2),
thisis inferred to be valid current implicit information aboutmulticast M5,1because the
log Log3 already contains explicit informationP6 ∈M5,1.Dests about that multicast.
Therefore, the explicit informationin Log3 is inferred to be old and must be deleted to
achieve optimality. M5,1Dests is set to ∅ in Log3.
 The logic by which P2 learns this implicit knowledge on the arrival of M3,3is
identical.

Processing at P6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further,
P6propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current
implicit information M5,1 has been delivered to P6 by its very absence in the explicit
information.
 When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests
= P6 it is used only to ensure causal delivery of M4,3 using the Delivery
Condition,and is not inserted in Log6 (constraint I) – further, the presence of M5,1
.Dests = P4in Log6 implies the implicit information that M5,1 has already been
delivered to P6. Also, the absence of P4 in M5,1 .Dests in the explicit
piggybacked information implies the implicit information that M5,1 has been
delivered or is guaranteed to bedelivered in causal order to P4, and, therefore,
M5,1. Dests is set to ∅ in Log6.
 When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests
= {P4, P6} it is used only to ensure causal delivery of M4,3 using the Delivery
Condition, and is not inserted in Log6 because Log6 contains M5,1 .Dests = ∅,
which gives the implicit information that M5,1 has been delivered or is
guaranteedto be delivered in causal order to both P4 and P6.

24
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Processing at P1
 When M2,2arrives carrying piggybacked information M5,1.Dests = P6 this
(new)information is inserted in Log1.
 When M6,2arrives with piggybacked information M5,1.Dests ={P4}, P1learns
implicit information M5,1has been delivered to P6 by the very absence of explicit
information P6 ∈ M5,1.Dests in the piggybacked information, and hence marks
information P6 ∈ M5,1Dests for deletion from Log1. Simultaneously, M5,1Dests =
P6 in Log1 implies the implicit information that M5,1has been delivered or is
guaranteed to be delivered incausal order to P4.Thus, P1 also learns that the explicit
piggybacked information M5,1.Dests = P4 is outdated. M5,1.Dests in Log1 is set to
∅.
 The information “P6 ∈M5,1.Dests piggybacked on M2,3,which arrives at P 1, is
inferred to be outdated usingthe implicit knowledge derived from M5,1.Dest= ∅”
in Log1.

2.11 TOTAL ORDER

For each pair of processes Pi and Pj and for each pair of messages Mx and My that are
delivered to both the processes, Pi is delivered Mx before My if and only if Pj is
delivered Mxbefore My.

Example
 The execution in Figure 6.11(b) does not satisfy total order. Even
 if the message m did not exist, total order would not be satisfied. The execution
 in Figure 6.11(c) satisfies total order.

25
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Centralized Algorithm for total ordering

Each process sends the message it wants to broadcast to a centralized process, which
relays all the messages it receives to every other process over FIFO channels.

Complexity: Each message transmission takes two message hops and exactly n messages
in a system of n processes.

Drawbacks: A centralized algorithm has a single point of failure and congestion, and is
not an elegant solution.

Three phase distributed algorithm

Three phases can be seen in both sender and receiver side.

Sender side

Phase 1
 In the first phase, a process multicasts the message M with a locally unique tag and
the local timestamp to the group members.

Phase 2
 The sender process awaits a reply from all the group members who respond with a
tentative proposal for a revised timestamp for that message M.
 The await call is non-blocking.

Phase 3
 The process multicasts the final timestamp to the group.

26
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Fig 2.9: Sender side of three phase distributed algorithm


Receiver Side
Phase 1
 The receiver receives the message with a tentative timestamp. It updates the
variablepriority that tracks the highest proposed timestamp, then revises the
proposed timestamp to the priority, and places the message with its tag and
the revised timestamp at the tail of the queue temp_Q. In the queue, the
entry is marked as undeliverable.

Phase 2
 The receiver sends the revised timestamp back to the sender. The receiver
then waitsin a non-blocking manner for the final timestamp.

Phase 3
 The final timestamp is received from the multicaster. The corresponding
message entry in temp_Q is identified using the tag, and is marked as
deliverable after the revised timestamp is overwritten by the final
timestamp.
 The queue is then resorted using the timestamp field of the entries as the
key. As thequeue is already sorted except for the modified entry for the
message under consideration, that message entry has to be placed in its
sorted position in the queue.
 If the message entry is at the head of the temp_Q, that entry, and all
consecutive subsequent entries that are also marked as deliverable, are
dequeued from temp_Q,and enqueued in deliver_Q.

27
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Complexity
This algorithm uses three phases, and, to send a message to n − 1 processes, it uses
3(n – 1)messages and incurs a delay of three message hops
Example An example execution to illustrate the algorithm is given in Figure 6.14.
Here, A and B multicast to a set of destinations and C and D are the common
destinations for both multicasts.
 Figure 6.14a. The main sequence of steps is as follows:
1. A sends a REVISE_TS(7) message, having timestamp 7. B sends a
REVISE_TS(9) message, having timestamp 9.
2. C receives A’s REVISE_TS(7), enters the corresponding message in temp_Q,
and marks it as undeliverable; priority = 7. C then sends PROPOSED_TS(7)
message to A.
3. D receives B’s REVISE_TS(9), enters the corresponding message in temp_Q,
and marks it as undeliverable; priority = 9. D then sends PROPOSED_TS(9)
message to B.
4. C receives B’s REVISE_TS(9), enters the corresponding message in
temp_Q, and marks it as undeliverable; priority = 9. C then sends
PROPOSED_TS(9) message to B.
5. D receives A’s REVISE_TS(7), enters the corresponding message in temp_Q,
and marks it as undeliverable; priority = 10. D assigns a tentative timestamp
value of 10, which isgreater than all of the timestamps on REVISE_TSs seen
so far, and then sends PROPOSED_TS(10) message to A.
The state of the system is as shown in the figure.
• Figure 6.14(b) The main steps is as follows:
6. When A receives PROPOSED_TS(7) from C and PROPOSED_TS(10)
from D, it computes the final timestamp as max(7, 10) = 10, and sends
FINAL_TS(10) to C and D.
7. When B receives PROPOSED_TS(9) from C and PROPOSED_TS(9) from
D, it computes the final timestamp as max(9, 9)= 9, and sends
FINAL_TS(9) to C and D.
8. C receives FINAL_TS(10) from A, updates the corresponding entry in
temp_Q with the timestamp, resorts the queue, and marks the message as
deliverable. As the message is not at the head of the queue, and some entry
ahead of it is still undeliverable, the message is not moved to delivery_Q.
9. D receives FINAL_TS(9) from B, updates the corresponding entry in
temp_Q by marking the corresponding message as deliverable, and resorts
the queue. As the message is at the head of the queue, it is moved to
delivery_Q.
10. When C receives FINAL_TS(9) from B, it will update the corresponding
entry in temp_Q by marking the corresponding message as deliverable. As
the message is at the head of the queue, it is moved to the delivery_Q, and
the next message (of A), which is also deliverable, is also moved to the

28
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

delivery_Q.
11. When D receives FINAL_TS(10) from A, it will update the corresponding entry in temp_Q by
marking the corresponding message as deliverable. As the message is at the head of the queue, it

is moved to the delivery_Q.

Figure: An example to illustrate the three-phase total ordering algorithm. (a) A


snapshot for PROPOSED_TS and REVISE_TS messages. The dashed lines show the
further execution after the snapshot. (b) The FINAL_TS messages in the example.

2.12 GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS


4.1 Introduction
 A distributed computing system consists of spatially separated processes
that do not share a common memory and communicate asynchronously with
each other by messagepassing over communication channels.
 Each component of a distributed system has a local state. The state of a
process is the state of its local memory and a history of its activity.
 The state of a channel is the set of messages in the transit.
 The global state of a distributed system is the collection of states of the
process and the channel.

29
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 Applications that use the global state information are :


o deadlocks detection
o failure recovery,
o for debugging distributed software
 If shared memory is available then an up-to-date state of the entire system
is available to the processes sharing the memory.
 The absence of shared memory makes difficult to have the coherent and
complete view of the system based on the local states of individual
processes.
 A global snapshot can be obtained if the components of distributed system
record their local states at the same time. This is possible if the local clocks
at processes were perfectly synchronized or a global system clock that is
instantaneously read by the processes.
 However, it is infeasible to have perfectly synchronized clocks at various
sites as the clocks are bound to drift. If processes read time from a single
common clock (maintained at one process), various indeterminate
transmission delays may happen.
 In both cases, collection of local state observations is not meaningful, as
discussed below.
Example: Money Transfer
 Let S1 and S2 be two distinct sites of a distributed system which maintain
bank accounts A and B, respectively. Let the communication channels from
site S1 to site S2 and from site S2 to site S1 be denoted by C12 and C21,
respectively.
 Consider the following sequence of actions, which are also illustrated in the timing
diagram of Figure 4.1:
 Time t0: Initially, Account A=$600, Account B=$200, C12 =$0, C21=$0.
 Time t1: Site S1 initiates a transfer of $50 from A to
B. Hence,A= $550, B=$200, C12=$50,
C21=$0.
 Time t2: Site S2 initiates a transfer of $80 from Account B to
A. Hence,A= $550,B=$120, C12 =$50, C21=$80.
 Time t3: Site S1 receives the message for a $80 credit to Account
A. Hence,A=$630, B=$120, C12 =$50, C21 =$0.
 Time t4: Site S2 receives the message for a $50 credit to Account
B. Hence,A=$630, B=$170, C12=$0, C21=$0.
 Suppose the local state of Account A is recorded at time t0 which is $600
and the local state of Account B and channels C12 and C21 are recorded at
time t2 are $120, $50, and $80, respectively.
 Then the recorded global state shows $850 in the system. An extra $50

30
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

appears in the system.


 Reason: Global state recording activities of individual components must be
coordinated.


2.13 SYSTEM MODEL AND DEFINITIONS
 The system consists of a collection of n processes, p1, p2,…, pn, that are connected
by channels.
 There is no globally shared memory and processes communicate solely by passing
messages (send and receive) asynchronously i.e., delivered reliably with finite but
arbitrary time delay.
 There is no physical global clock in the system.
 The system can be described as a directed graph where vertices represents processes
and edges represent unidirectional communication channels.
 Let Cij denote the channel from process pi to process pj .
 Processes and channels have states associated with them.
 Process State: is the contents of processor registers, stacks, local memory, etc.,
and dependents on the local context of the distributed application.
 Channel State of Cij: is SCij , is the set of messages in transit of the channel.
 The actions performed by a process are modeled as three types of events,
o internal events – affects the state of the process.
o message send events, and
o message receive events.
 For a message mij that is sent by process pi to process pj, let send(mij) and
rec(mij)denote its send and receive events affects state of the channel,
respectively.
 The events at a process are linearly ordered by their order of occurrence.
 At any instant, the state of process pi, denoted by LSi, is a result of the sequence of all

31
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

the events executed by pi up to that instant.


 For an event e and a process state LSi, e∈LSi iff e belongs to the
sequence of eventsthat have taken process pi to state LSi.
 For an event e and a process state LSi, e∉LSi iff e does not belong to the sequence of
events that have taken process pi to state LSi.
 For a channel Cij , the following set of messages will be:
Transit : transit(LSi, LSj) = {mij | send(mij) ∈ LSi ⋀ rec(mij) ∉ LSj }
 There are several models of communication among processes.
 In the FIFO model, each channel acts as a first-in first-out message
queue hence,message ordering is preserved by a channel.
 In the non-FIFO model, a channel acts like a set in which the sender
process addsmessages and the receiver process removes messages from it
in a random order.
 In causal delivery of messages satisfies the
following property:“for any two messages mij and
mkj,
if send(mij) → send(mkj), then rec (mij) → rec(mkj).”
 Causally ordered delivery of messages implies FIFO message delivery.
 The causal ordering model is useful in developing distributed algorithms
and maysimplify the design of algorithms.
A consistent global state
 The global state of a distributed system is a collection of the local states of the
processes and the channels. Notationally, global state GS is defined as,
o GS ={𝖴iLSi, 𝖴i,jSCij}.
 A global state GS is a consistent global state iff it satisfies the following two conditions:
C1: send(mij)∈LSi ⇒ mij ∈SCij ⊕ rec(mij)∈LSj (⊕ is the Ex-OR operator).
C2: send(mij) ∉LSi ⇒ mij ∉ SCij 𝖠 rec(mij) ∉ LSj .
 In a consistent global state, every message that is recorded as received is
also recordedas sent. These are meaningful global states.
 The inconsistent global states are not meaningful ie., without send if
receive of the respective message exists.
Interpretation in terms of cuts
 Cuts is a zig-zag line that connects a point in the space–time diagram at
some arbitrarypoint in the process line.
 Cut is a powerful graphical aid for representing and reasoning about the
global states of a computation.

32
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 Left side of the cut is referred as PAST event and right is referred as FUTURE event.
 A consistent global state corresponds to a cut in which every message
received in the PAST of the cut has been sent in the PAST of that cut. Such
a cut is known as a consistent cut.
 Example: Cut C2 in the above figure and

 All the messages that cross the cut from the PAST to the FUTURE are
captured in thecorresponding channel state.
 If the flow is from the FUTURE to the PAST is inconsistent. Example: Cut C1.

Issues in recording a global state


 If a global physical clock is used then the following simple procedure is
used to recorda consistent global snapshot of a distributed system.
o Initiator of the snapshot decides a future time at which the
snapshot is to betaken and broadcasts this time to every process.

33
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

o All processes take their local snapshots at that instant in the global
time.
o The snapshot of channel Cij includes all the messages that process
pj receives after taking the snapshot and whose timestamp is smaller
than the time of the snapshot.
 However, a global physical clock is not available in a distributed system. Hence
thefollowing two issues need to be addressed to record a consistent global snapshot.
 I1: How to distinguish between the messages to be recorded in the snapshot
from those not to be recorded?
 Any message i.e., sent by a process before recording its snapshot, must be recorded
inthe global snapshot. (from C1).
 Any message that is sent by a process after recording its snapshot, must not be
recorded in the global snapshot (from C2).
 I2: How to determine the instant when a process takes its snapshot.
 A process pj must record its snapshot before processing a message mij that was sent
byprocess pi after recording its snapshot.
 These algorithms use two types of messages: computation messages and control
messages. The former are exchanged by the underlying application and the latter are
exchanged by the snapshot algorithm.
2.14 SNAPSHOT ALGORITHMS FOR FIFO CHANNELS
Each distributed application has number of processes running on different
physical servers. These processes communicate with each other through messaging
channels.

A snapshot captures the local states of each process along with the state of each communication channel.

Snapshots are required to:


 Checkpointing
 Collecting garbage
 Detecting deadlocks
 Debugging
Chandy–Lamport algorithm
 This algorithm uses a control message, called a marker.
 After a site has recorded its snapshot, it sends a marker along all of its
outgoing channels before sending out any more messages.
 Since channels are FIFO, marker separates the messages in the channel into
those to be included in the snapshot from those not to be recorded in the
snapshot. This addresses issue I1.
 The role of markers in a FIFO system is to act as delimiters for the
messages in the channels so that the channel state recorded by the process
at the receiving end of the channel satisfies the condition C2.
 Since all messages that follow a marker on channel Cij have been sent by process pi

34
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

after pi has taken its snapshot, process pj must record its snapshot if not recorded
earlier and record the state of the channel that was received along the marker
message. This addresses issue I2.
The algorithm
 The algorithm is initiated by any process by executing the marker sending rule.
 The algorithm terminates after each process has received a marker on all of its
incoming channels.

Algorithm 4.1 The Chandy–Lamport algorithm.

Marker sending rule for process pi


(1) Process pi records its state.
(2) For each outgoing channel C on
which a marker has not been sent, pi
sends a marker along C
(3) before pi sends further messages along C.

Marker receiving rule for process pj


On receiving a marker along channel C:
if pj has not recorded its state
then Record the state of C
as the empty setExecute the
“marker sending rule”
else
Record the state of C as the set of
messages received along C after
pj,s state was recordedand before
pj received the marker along C
Initiating a snapshot
Process Pi initiates the snapshot
Pi records its own state and prepares a special marker message.
Send the marker message to all other processes.
Start recording all incoming messages from channels Cij for j not equal to i
Propagating a snapshot
For all processes Pj consider a message on channel Ckj.
If marker message is seen for the first time:
 Pj records own sate and marks Ckj as empty
 Send the marker message to all other processes.
 Record all incoming messages from channels Clj for 1 not equal to j or k.

35
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

 Else add all messages from inbound channels.

Terminating a snapshot
All processes have received a marker.
All process have received a marker on all the N-1 incoming channels.
A central server can gather the partial state to build a global snapshot.

Correctness
 To prove the correctness of the algorithm, it is shown that a recorded
snapshot satisfiesconditions C1 and C2.
 Since a process records its snapshot when it receives the first marker on any
incoming channel, no messages that follow markers on the channels
incoming to it are recorded in the process’s snapshot.
 Moreover, a process stops recording the state of an incoming channel when a
marker isreceived on that channel.
 Due to FIFO property of channels, it follows that no message sent after the
marker on that channel is recorded in the channel state. Thus, condition C2
is satisfied.
 When a process pj receives message mij that precedes the marker on
channel Cij, it actsas follows:
 If process pj has not taken its snapshot yet, then it includes mij in its
recorded snapshot. Otherwise, it records mij in the state of the channel Cij.
Thus, condition C1 is satisfied.
Complexity
 The recording part of a single instance of the algorithm requires O(e)
messages and O(d) time, where e is the number of edges in the network and
d is the diameter of the network.
Properties of the recorded global state
The recorded global state may not correspond to any of the global states that
occurred during the computation.
This happens because a process can change its state asynchronously before
the markers it sent are received by other sites and the other sites record their states.
But the system could have passed through the recorded global states in some
equivalent executions.
The recorded global state is a valid state in an equivalent execution and if a
stable property (i.e., a property that persists) holds in the system before the snapshot
algorithm begins, it holds in the recorded global snapshot.
Therefore, a recorded global state is useful in detecting stable properties.

36
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Chandy–Lamport algorithm Example:


Snapshots for Money Transfer:
Initiating the Snapshot:

Snapshot 1:

Snapshot 2:

Snapshot 3:

37
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Snapshot 4:

Snapshot 5:

Snapshot 6:

38
CS3551-DISTRIBUTED COMPUITNG [UNIT II]

Snapshot 7:

39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy