Distributed System - Event Ordering
Distributed System - Event Ordering
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering
Time and Ordering
The two critical differences between centralized and distributed systems are:
• absence of shared memory
• absence of a global clock
Topics covered:
• how programming mechanisms change as a result of these differences
• algorithms that operate in the absence of a global clock
• algorithms that create a sense of a shared, global time
• algorithms that capture a consistent state of a system in the
absence of shared memory
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 1
Event Ordering
P1 P2 P3
P
Q1 Q2 Q3
Q
How can the events on P be related to the events on Q?
Which events of P “happened before” which events of Q?
Partial answer: events on P and Q are strictly ordered. So:
P1 > P2 > P3
and
Q1 > Q2 > Q3
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 2
Event Ordering
P1 P2 P3
P
Message
Q1 Q2 Q3
Realization: the only events on P that can causally affect events on Q are those
that involve communication between P and Q.
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 3
Event Ordering
P1 P2 P3
P
Message
Q1 Q2 Q3
Q
“Happened Before” relation:
If Ei and Ej are two events of the same process, then
Ei > Ej if i < j.
If Ei and Ej are two events of different processes, then
Ei > Ej
if Ei is a message send event and Ej is the corresponding message
receive event.
The relation is transitive.
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 4
Lamport's Algorithm
Lamport's algorithm is based on two implementation rules that define how
each process's local clock is incremented.
Notation:
• the processes are named P ,
i
• each process has a local clock, C
i
• the clock time for an event a on process P is denoted by C (a).
i i
Rule 1:
If a and b are two successive events in Pi and a > b
then Ci (b) = Ci (a) + d where d > 0.
Rule 2:
If a is a message send event on Pi and b is the message receive event on Pj
then:
• the message is assigned the timestamp t = C (a)
m i
• C (b) = max ( C , t +d)
j j m
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 5
Example of Lamport’s Algorithm
1 2 3 10
P1
1 4 6
P2
5
P3
1 2 3 4 5 6 7 8 9
• BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 6
Limitation of Lamport's Algorithm
In Lamport's algorithm two events that are causally related will be related
through their clock times. That is:
If a --> b then C(a) < C(b)
However, the clock times alone do not reveal which events are causally
related. That is, if C(a) < C(b) then it is not known if a --> b or not. All
that is known is:
if C(a) < C(b) then b -/-> a
It would be useful to have a stronger property - one that guarantees that
a --> b iff C(a) < C(b)
This property is guaranteed by Vector Clocks.
• BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 7
Vector Clock Rules
Each process Pi is equipped with a clock Ci which is an integer vector of
length n.
Ci(a) is referred to as the timestamp event a at Pi
Ci[i], the ith entry of Ci corresponds to Pi’s on logical time.
Ci[j], j i is Pi’s best guess of the logical time at Pj
Implementation rules for vector clocks:
[IR1] Clock Ci is incremented between any two successive events in
process Pi
Ci[i] := Ci[i] + d (d > 0)
[IR2] If event a is the sending of the message m by process Pi, then
message m is assigned a vector timestamp tm = Ci(a); on receiving the same
message m by process Pj, Cj is updated as follows:
k, Cj[k] := max(Cj[k], tm [k])
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 8
Vector Clocks
(2,3,1)
P2
(0,1,0) (2,2,0) (2,4,1)
P3
(0,0,1) (0,0,2)
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 9
Causal Ordering of Messages
Send(M1)
Space P1
Send(M2)
P2
P3
Time
BCS704 (DISTRIBUTED SYSTEM) Dr. AKASH SANGHI Dept. of Computer Science and Engineering 10
Birman-Schiper-Stephenson Protocol
1. Before broadcasting a message m, a process Pi increments the vector
time VTPi[i] and timestamps m. Note that (VTPi[i] - 1) indicates how many
messages from Pi precede m.
2. A process Pj Pi, upon receiving message m timestamped VTm from Pi,
delays its delivery until both the following conditions are satisfied.
a. VTPj[i] = VTm[i] - 1