0% found this document useful (0 votes)
22 views58 pages

DC Chap 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views58 pages

DC Chap 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Distributed Systems

(4th edition, version 01)

Chapter 04: Communication


Inter-process communication is at the heart of all distributed systems.

It makes no sense to study distributed systems without carefully examining


the ways that processes on different machines can exchange information.

Communication in distributed systems has traditionally always been based on


low-level message passing as offered by the underlying network.

Expressing communication through message passing is more difficult than using


primitives based on shared memory, as available for non-distributed platforms.

Modern distributed systems often consist of thousands or even millions of


processes scattered across a network with unreliable communication, such as
the Internet.

We then look at two widely used models for communication: Remote Procedure
Call (RPC), and Message-Oriented Middleware (MOM).
Communication Foundations

Basic networking model

Drawbacks
• Focus on message-passing only
• Often unneeded or unwanted functionality
• Violates access transparency
Layered Protocols
Figure 2: A typical message as it appears on the network.
Communication Foundations

Low-level layers
Recap
• Physical layer: contains the specification and implementation of bits, and
their transmission between sender and receiver
• Data link layer: prescribes the transmission of a series of bits into a frame
to allow for error and flow control
• Network layer: describes how packets in a network of computers are to
be routed.

Observation
For many distributed systems, the lowest-level interface is that of the network
layer.

Layered Protocols
Communication Foundations

Transport Layer
Important
The transport layer provides the actual communication facilities for most
distributed systems.

Standard Internet protocols


• TCP: connection-oriented, reliable, stream-oriented communication
• UDP: unreliable (best-effort) datagram communication

Layered Protocols
Communication Foundations

Middleware layer
Observation
Middleware is invented to provide common services and protocols that can be
used by many different applications
• A rich set of communication protocols
• (Un)marshaling of data, necessary for integrated systems
• Naming protocols, to allow easy sharing of resources
• Security protocols for secure communication
• Scaling mechanisms, such as for replication and caching

Note
What remains are truly application-specific protocols... such as?

Layered Protocols
Communication Foundations

An adapted layering scheme

Figure 4.3: An adapted reference model for networked communication.

Middleware is an application that logically lives (mostly) in the OSI application layer, but
which contains many general-purpose protocols that warrant their own layers, independent
of other, more specific applications.

Layered Protocols
Communication Foundations

Types of communication
Distinguish...

• Transient versus persistent communication


• Asynchronous versus synchronous communication

Types of Communication
Communication Foundations

Types of communication
Transient versus persistent

• Transient communication: Comm. server discards message when it


cannot be delivered at the next server, or at the receiver.
• Persistent communication: A message is stored at a communication
server as long as it takes to deliver it.

Types of Communication
Communication Foundations

Types of communication
Places for synchronization

• At request submission
• At request delivery
• After request processing

Types of Communication
Communication Foundations

Client/Server
Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
• Client and server have to be active at the time of communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and subsequently
processes them

Types of Communication
Communication Foundations

Client/Server
Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
• Client and server have to be active at the time of communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and subsequently
processes them

Drawbacks synchronous communication


• Client cannot do any other work while waiting for reply
• Failures have to be handled immediately: the client is waiting
• The model may simply not be appropriate (mail, news)

Types of Communication
Communication Foundations

Messaging
Message-oriented middleware
Aims at high-level persistent asynchronous communication:
• Processes send each other messages, which are queued
• Sender need not wait for immediate reply, but can do other things
• Middleware often ensures fault tolerance

Types of Communication
Communication Remote procedure call

Basic RPC operation


Observations
• Application developers are familiar with simple procedure model
• Well-engineered procedures operate in isolation (black box)
• There is no fundamental reason not to execute procedures on separate
machine

Conclusion
Communication between caller & callee can be hidden by using procedure-call
mechanism.

Basic RPC operation


Communication Remote procedure call

Basic RPC operation

1. Client procedure calls client stub. 6. Server does local call; returns result to stub.
2. Stub builds message; calls local OS. 7. Stub builds message; calls OS.
3. OS sends message to remote OS. 8. OS sends message to client’s OS.
4. Remote OS gives message to stub. 9. Client’s OS gives message to stub.
5. Stub unpacks parameters; calls server. 10. Client stub unpacks result; returns to client.

Basic RPC operation


Communication Remote procedure call

RPC: Parameter passing


There’s more than just wrapping parameters into a message
• Client and server machines may have different data representations (think
of byte ordering)
• Wrapping a parameter means transforming a value into a sequence of
bytes
• Client and server have to agree on the same encoding:

• How are basic data values represented (integers, floats, characters)


• How are complex data values represented (arrays, unions)

Conclusion
Client and server need to properly interpret messages, transforming them into
machine-dependent representations.

Parameter
Passing Value Parameters (1)

Figure 4-7. The steps involved in a doing a


remote computation through RPC.
18
Passing Value Parameters (2)

Figure 4-8. (a) The original message on the Pentium.


19
Passing Value Parameters (3)

Figure 4-8. (b) message after receipt on the SPARC.


20
Passing Value Parameters (4)

Figure 4-8. (c) The message after being inverted. The little numbers in
boxes indicate the address of each byte.
21
Communication Remote procedure call

RPC: Parameter passing


Some assumptions
• Copy in/copy out semantics: while procedure is executed, nothing can be
assumed about parameter values.
• All data that is to be operated on is passed by parameters. Excludes
passing references to (global) data.

Conclusion
Full access transparency cannot be realized.

A remote reference mechanism enhances access transparency


• Remote reference offers unified access to remote data
• Remote references can be passed as parameter in RPCs
• Note: stubs can sometimes be used as such references

Parameter
Communication Remote procedure call

Asynchronous RPCs
Essence
Try to get rid of the strict request-reply behavior, but let the client continue
without waiting for an answer from the server.

Variations on RPC
Communication Remote procedure call

Sending out multiple RPCs


Essence
Sending an RPC request to a group of servers.

Variations on RPC
Writing a Client
and a Server (1)

Figure 4-12. The steps in writing a client and


a server in DCE RPC.
25
Writing a Client and a Server (2)

Three files output by the IDL


compiler:

• A header file (e.g.,


interface.h, in C terms).
• The client stub.
• The server stub.
26
Binding a Client to a Server (1)

• Registration of a server makes


it possible for a client to locate
the server and bind to it.

• Server location is done in two


steps:
1. Locate the server’s machine.
2. Locate the server on that
machine. 27
Binding a Client to a Server (2)

Figure 4-13. Client-to-server binding in DCE.


28
Communication Message-oriented communication

Transient messaging: sockets


Berkeley socket interface
Operation Description

socket Create a new communication end point


bind Attach a local address to a socket
listen Tell operating system what the maximum number of pending
connection requests should be
accept Block caller until a connection request arrives
connect Actively attempt to establish a connection
send Send some data over the connection
receive Receive some data over the connection
close Release the connection

Simple transient messaging with sockets


Communication Message-oriented communication

Sockets: Python code


Server
1 from socket import *
2
3 c la s s Server:
4 def run(s e lf) :
5 s = socket(AF_INET, SOCK_STREAM) s.bind((HOST, PORT))
6 s . lis t e n ( 1 )
7 (conn, addr) = s . a c c e p t() # returns new socket and addr. c l i e n t
8
9 while True: # forever
10 d a ta = conn.recv(1024) # receive data from c l i e n t
11 i f not d a t a : break # s to p i f c l i e n t stopped
12 conn.send(data+b"*") # return s e n t data p lu s an "* "
13 conn.close() # c lo s e t h e connection

Client
1 c la s s Client :
2 def run(s e lf) :
3 s = socket(AF_INET, SOCK_STREAM)
4 s.connect((HOST, PORT)) # connect t o server (block u n t i l accepted)
5 s.send(b"Hello, world") # send same data
6 d a ta = s.recv(1024) # receive t h e response
7 print(data) # p r i n t what you received
8 s.send(b"") # t e l l t h e server t o close
9 s . c lo s e () # c los e t h e connection

Simple transient messaging with sockets


Communication Message-oriented communication

Making sockets easier to work with


Observation
Sockets are rather low level and programming mistakes are easily made.
However, the way that they are used is often the same (such as in a client-
server setting).

Alternative: ZeroMQ
Provides a higher level of expression by pairing sockets: one for sending
messages at process P and a corresponding one at process Q for receiving
messages. All communication is asynchronous.

Three patterns
• Request-reply
• Publish-subscribe
• Pipeline

Advanced transient messaging


Communication Message-oriented communication

Request-reply

1 import zmq
2
3 def s e rve r ( ) :
4 context = zmq.Context()
5 socket = context.socket(zmq.REP) # create r e p ly socket
6 socket.bind("tcp:// * :12345") # bind socket t o address
7
8 while True:
9 message = socket.recv() # wait f o r incoming message #
10 i f not "STOP" i n str(message): i f not t o s t o p . . .
11 re p ly = str(message.decode())+’*’ # append " * " t o message
12 socket.send(reply.encode()) # send i t away (encoded)
13 e ls e :
14 break # break o u t o f loop and end
15
16 def c lie n t ( ) :
17 context = zmq.Context()
18 socket = context.socket(zmq.REQ) # create request socket
19
20 socket.connect("tcp://localhost:12345" ) # block u n t i l connected
21 socket.send(b"Hello world") # send message
22 message = socket.recv() # block u n t i l response
23 socket.send(b"STOP") # t e l l server t o stop
24 print(message.decode()) # print result

Advanced transient messaging


Communication Message-oriented communication

Publish-subscribe

1 import multiprocessing
2 import zmq, time
3
4 def s e rve r ( ) :
5 context = zmq.Context()
6 socket = context.socket(zmq.PUB) # create a publisher socket
7 socket.bind("tcp:// * :12345") # bind socket t o t h e address
8 while True:
9 time.sleep(5) # wait every 5 seconds
10 t = "TIME " + time.asctime()
11 socket.send(t.encode()) # publish t h e current time
12
13 def c lie n t ( ) :
14 context = zmq.Context()
15 socket = context.socket(zmq.SUB) # create a subscriber socket
16 socket.connect("tcp://localhost:12345") # connect t o t h e server
17 socket.setsockopt(zmq.SUBSCRIBE, b"TIME") # subscribe t o TIME messages
18
19 f or i i n range(5): # Five ite r a tio n s
20 time = socket.recv() # receive a message related t o subscription
21 print(time.decode()) # p r i n t t h e r e s u l t

Advanced transient messaging


Communication Message-oriented communication

Pipeline

1 def producer():
2 context = zmq.Context()
3 socket = context.socket(zmq.PUSH) # create a push socket
4 socket.bind("tcp://127.0.0.1:12345") # bind socket t o address
5
6 while True:
7 workload = random.randint(1, 100) # compute workload
8 socket.send(pickle.dumps(workload)) # send workload t o worker
9 time.sleep(workload/NWORKERS) # balance production by waiting
10
11 def worker(id):
12 context = zmq.Context()
13 socket = context.socket(zmq.PULL) # create a p u l l socket
14 socket.connect("tcp://localhost:12345") # connect t o t h e producer
15
16 while True:
17 work = pickle.loads(socket.recv()) # receive work from a source
18 time.sleep(work) # pretend t o work

Advanced transient messaging


Communication Message-oriented communication

MPI: When lots of flexibility is needed


Representative operations

Operation Description

MPI BSEND Append outgoing message to a local send buffer

MPI SEND Send a message and wait until copied to local or


remote buffer

MPI SSEND Send a message and wait until transmission starts

MPI Send a message and wait for reply


SENDRECV
MPI ISEND Pass reference to outgoing message, and continue

MPI Pass reference to outgoing message, and wait until


ISSEND receipt starts

MPI RECV Receive a message; block if there is none

MPI IRECV Check if there is an incoming message, but do not


Advanced transient messaging
Communication Message-oriented communication

Queue-based messaging
Four possible combinations

Message-oriented persistent communication


Communication Message-oriented communication

Message-oriented middleware
Essence
Asynchronous persistent communication through support of middleware-level
queues. Queues correspond to buffers at communication servers.

Operations

Operati Description
on

PUT Append a message to a specified queue

GET Block until the specified queue is nonempty, and


remove the first message

POLL Check a specified queue for messages, and remove


the first. Never block

NOTIF Install a handler to be called when a message is put


Y into the specified queue

Message-oriented persistent communication


Communication Message-oriented communication

General model
Queue managers
Queues are managed by queue managers. An application can put messages
only into a local queue. Getting a message is possible by extracting it from a
local queue only ⇒ queue managers need to route messages.

Routing

Message-oriented persistent communication


Communication Message-oriented communication

Message broker
Observation
Message queuing systems assume a common messaging protocol: all
applications agree on message format (i.e., structure and data representation)

Broker handles application heterogeneity in an MQ system


• Transforms incoming messages to target format
• Very often acts as an application gateway
• May provide subject-based routing capabilities (i.e., publish-subscribe
capabilities)

Message-oriented persistent communication


Communication Message-oriented communication

Message broker: general architecture

Message-oriented persistent communication


Communication Message-oriented communication

Example: AMQP
Lack of standardization
Advanced Message-Queuing Protocol was intended to play the same role as,
for example, TCP in networks: a protocol for high-level messaging with
different implementations.

Basic model
Client sets up a (stable) connection, which is a container for serveral (possibly
ephemeral) one-way channels. Two one-way channels can form a session. A
link is akin to a socket, and maintains state about message transfers.
Example: Advanced Message Queuing Protocol (AMQP)
Communication Message-oriented communication

Example: AMQP-based producer

1 import rabbitpy
2
3 def producer():
4 connection = rabbitpy.Connection() # Connect t o RabbitMQ server
5 channel = connection.channel() # Create new channel on t h e connection
6
7 exchange = rabbitpy.Exchange(channel, ’exchange’) # Create an exchange
8 exchange.declare()
9
10 queue1 = rabbitpy.Queue(channel, ’example1’) # Create 1 s t queue
11 queue1.declare()
12
13 queue2 = rabbitpy.Queue(channel, ’example2’) # Create 2nd queue
14 queue2.declare()
15
16 queue1.bind(exchange, ’example-key’ ) # Bind queue1 t o a s in g le key
17
queue2.bind(exchange, ’example-key’ ) # Bind queue2 t o t h e same key
18
19 message = rabbitpy.Message(channel, ’Test message’)
20 message.publish(exchange, ’example-key’ ) # Publish t h e message using t h e key
21 exchange.delete()

Example: Advanced Message Queuing Protocol (AMQP)


Communication Message-oriented communication

Example: AMQP-based consumer

1 import rabbitpy
2
3 def consumer():
4 connection = rabbitpy.Connection()
5 channel = connection.channel()
6
7 queue = rabbitpy.Queue(channel, ’example1’)
8
9 # While th e r e are messages i n t h e queue, f e t c h them using Basic.Get
10 while len(queue) > 0 :
11 message = queue.get()
12 print(’Message Q1: %s’ % message.body.decode()) message.ack()
13
14
15 queue = rabbitpy.Queue(channel, ’example2’)
16
17 while len(queue) > 0 :
18 message = queue.get()
19
print(’Message Q2: %s’ % message.body.decode())
20
message.ack()

Example: Advanced Message Queuing Protocol (AMQP)


Communication Multicast communication

Application-level multicasting
Essence
Organize nodes of a distributed system into an overlay network and use that
network to disseminate data:
• Oftentimes a tree, leading to unique paths
• Alternatively, also mesh networks, requiring a form of routing

Application-level tree-based multicasting


Communication Multicast communication

Application-level multicasting in Chord


Basic approach
1. Initiator generates a multicast identifier mid .
2. Lookup succ(mid ), the node responsible for mid .
3. Request is routed to succ(mid ), which will become the root.
4. If P wants to join, it sends a join request to the root.
5. When request arrives at Q:
• Q has not seen a join request before ⇒ it becomes forwarder; P
becomes child of Q. Join request continues to be forwarded.
• Q knows about tree ⇒ P becomes child of Q. No need to forward
join request anymore.

Application-level tree-based multicasting


Communication Multicast communication

ALM: Some costs


Different metrics

• Link stress: How often does an ALM message cross the same physical
link? Example: message from A to D needs to cross ⟨Ra,Rb⟩twice.
• Stretch: Ratio in delay between ALM-level path and network-level path.
Example: messages B to C follow path of length 73 at ALM, but 47 at
network level ⇒ stretch = 73/47.

Application-level tree-based multicasting


Communication Multicast communication

Flooding
Essence
P simply sends a message m to each of its neighbors. Each neighbor will
forward that message, except to P, and only if it had not seen m before.

Flooding-based multicasting
Communication Multicast communication

Flooding
Essence
P simply sends a message m to each of its neighbors. Each neighbor will
forward that message, except to P, and only if it had not seen m before.

Variation
Let Q forward a message with a certain probability pflood , possibly even
dependent on its own number of neighbors (i.e., node degree) or the degree of
its neighbors.

Flooding-based multicasting
Communication Multicast communication

Epidemic protocols
Assume there are no write–write conflicts
• Update operations are performed at a single server
• A replica passes updated state to only a few neighbors
• Update propagation is lazy, i.e., not immediate
• Eventually, each update should reach every replica

Two forms of epidemics


• Anti-entropy: Each replica regularly chooses another replica at random,
and exchanges state differences, leading to identical states at both
afterwards
• Rumor spreading: A replica which has just been updated (i.e., has been
contaminated), tells several other replicas about its update (contaminating
them as well).

Gossip-based data dissemination


Communication Multicast communication

Anti-entropy
Principle operations
• A node P selects another node Q from the system at random.
• Pull: P only pulls in new updates from Q
• Push: P only pushes its own updates to Q
• Push-pull: P and Q send updates to each other

Observation
For push-pull it takes O(log(N)) rounds to disseminate updates to all N nodes
(round = when every node has taken the initiative to start an exchange).

Gossip-based data dissemination


Communication Multicast communication

Anti-entropy: analysis
Basics
Consider a single source, propagating its update. Let pi be the probability that
a node has not received the update after the ith round.

Analysis: staying ignorant


• With pull, pi + 1 = (pi )2: the node was not updated during the ith round
and should contact another ignorant node during the next round.
1 (N−1 )(1−p i)
• With push, p i +1 = pi (1 − N−1 ) ≈ p ei −1 (for small pi and large
th
N): the node was ignorant during the i round and no updated node
chooses to contact it during the next round.
• With push-pull: (pi )2 ·(pi e −1)

Gossip-based data dissemination


Communication Multicast communication

Anti-entropy performance

Gossip-based data dissemination


Communication Multicast communication

Rumor spreading
Basic model
A server S having an update to report, contacts other servers. If a server is
contacted to which the update has already propagated, S stops contacting
other servers with probability pstop.

Observation
If s is the fraction of ignorant servers (i.e., which are unaware of the update), it
can be shown that with many servers

s = e−(1/pstop+1)(1−s)

Gossip-based data dissemination


Communication Multicast communication

Formal analysis
Notations
Let s denote fraction of nodes that have not yet been updated (i.e., susceptible;
i the fraction of updated (infected) and active nodes; and r the fraction of
updated nodes that gave up (removed).

From theory of epidemics

1) ds/dt = −s ·i
2) di/dt = s ·i −pstop ·(1 − s ) ·i
pstop
⇒ di/ds = −(1 + pstop) + s
⇒ i (s) = −(1 + pstop) ·s + pstop ·ln(s) + C

Wrap up
i (1) = 0 ⇒ C = 1 + pstop ⇒ i (s) = (1 + pstop) ·(1 − s ) + pstop ·ln(s). We are
looking for the case i (s) = 0, which leads to s = e−(1/pstop+1)(1−s)

Gossip-based data dissemination


Communication Multicast communication

Rumor spreading
The effect of stopping

Consider 10,000
nodes
1/pst s Ns
op

1 0.20318 203
8 2

2 0.05952 595
0

3 0.01982 198
7

4 0.00697 70
7

5 0.00251 25
6

6 0.00091 9
Gossip-based data dissemination
8
Communication Multicast communication

Rumor spreading
The effect of stopping

Consider 10,000
nodes
1/pst s Ns
op

1 0.20318 203
8 2

2 0.05952 595
0

3 0.01982 198
7
Note
If we really have to ensure that all servers are eventually
4 updated,
0.00697 rumor
70
spreading alone is not enough 7

5 0.00251 25
6

6 0.00091 9
Gossip-based data dissemination
8
Communication Multicast communication

Deleting values
Fundamental problem
We cannot remove an old value from a server and expect the removal to
propagate. Instead, mere removal will be undone in due time using epidemic
algorithms

Solution
Removal has to be registered as a special update by inserting a death
certificate

Gossip-based data dissemination


Communication Multicast communication

Deleting values
When to remove a death certificate (it is not allowed to stay for ever)
• Run a global algorithm to detect whether the removal is known
everywhere, and then collect the death certificates (looks like garbage
collection)
• Assume death certificates propagate in finite time, and associate a
maximum lifetime for a certificate (can be done at risk of not reaching all
servers)

Note
It is necessary that a removal actually reaches all servers.

Gossip-based data dissemination

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy