0% found this document useful (0 votes)
15 views161 pages

4 Merged

The document discusses the fundamentals of communication in distributed systems, focusing on the client-server model, layering, and protocols for data transmission. It covers various types of communication, including transient vs persistent and synchronous vs asynchronous, as well as the role of middleware and remote procedure calls (RPC). Additionally, it addresses the organization of socket interfaces, IP addressing, domain names, and the properties of DNS mappings in the context of network communication.

Uploaded by

htk717716
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views161 pages

4 Merged

The document discusses the fundamentals of communication in distributed systems, focusing on the client-server model, layering, and protocols for data transmission. It covers various types of communication, including transient vs persistent and synchronous vs asynchronous, as well as the role of middleware and remote procedure calls (RPC). Additionally, it addresses the organization of socket interfaces, IP addressing, domain names, and the properties of DNS mappings in the context of network communication.

Uploaded by

htk717716
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 161

Communicat

ion
Distributed Systems
Foundations
• Layered Protocol
• Types of Communication
A Client-Server Transaction
• Most distributed systems applications are based on a client-server
model
• A server process and one or more client processes
• Server manages some resource
• Server provides services to client by managing client resources
• Server activated by request from client
The problem of communication
Applicatio HTTP Skype SSH FTP
ns

Transmissio
Coaxial Fiber Wi-
n media
cable optic Fi

• Re-implement every application for every new underlying transmission medium?


• Change every application on any change to an underlying transmission medium?

• No! But how does the Internet design avoid this?


4
Solution: Layering
Applicatio FTP
HTTP Skype
ns
SSH
Intermediate layers

Transmissio
Coaxial Fiber Wi-Fi
n media
cable optic

• Intermediate layers provide set of abstractions for applications and media

• New apps or media need only implement for intermediate layer’s interface
5
Basic Networking Model
• Physical layer: contains the specification and
implementation of bits, and their
transmission between sender and receiver
• Data link layer: prescribes the transmission of
a series of bits into a frame to allow for
error and flow control
• Network layer: describes how packets in a
network of computers are to be routed and
handle congestion control
• Transport layer: establish a reliable
connection between application running
two computers
Intermediate on
Layers • NOTE: For many distributed systems, the
lowest-level interface is that of the network
layer.

Image Source: https://www.lifewire.com/osi-model-reference-guide-816289


How application data is sent and
received?

Image Source: http://books.msspace.net/mirrorbooks/snortids/0596006616/snortids-CHP-2-SECT-2.html


How does data from host A reach on
host

B?
How is it possible to send bits
across incompatible LANs
and WANs?
• Solution: protocol
running on each host and router
software
• Protocol is a set of rules that
governs how hosts and routers
should cooperate when
they transfer data from
network to network.
• Smooths out the differences
between the different
networks
Marshalling = the process of converting data structures or objects into a format suitable for
transmission or storage
Adapted Scheme with
Middleware Layer
• Middleware is invented to provide
common services and protocols that
can be used by many different
applications
• A rich set of communication protocols
• (Un)marshaling of data, necessary for
integrated systems
• Naming protocols, to allow easy
sharing of resources
• Security protocols for secure
communication
• Scaling mechanisms, such as
for replication and caching
Type of Communication

Transient vs Persistent Synchronous vs Asynchronous


• Transient communication: • Synchronous
Comm. server discards message Both send and receiver should
communication:
when it cannot be delivered be alive
at the next server, or at
the receiver. • Asynchronous
• Persistent communication: A communication: Sender can
message is stored at a just pass the message for
communication server receiver which can be
(Middleware) as long as it takes read
alive when receiver comes
to deliver it.
Places of Synchronization
• At request submission
• At request delivery
• After request processing
Client/Server Communication
• Client/Server computing is generally based on a model of transient
synchronous communication:
• Client and server have to be active at time of communication
• Client issues request and blocks until it receives reply
• Server essentially waits only for incoming requests, and subsequently
processes them

• Drawbacks synchronous communication


• Client cannot do any other work while waiting for reply
• Failures have to be handled immediately: the client is waiting
• The model may simply not be appropriate (mail, news)
Messaging: Message-oriented
middleware
• Aims at high-level persistent asynchronous communication:
• Processes send each other messages, which are
queuedneed not wait for immediate reply, but can do other things
•Sender
• Middleware often ensures fault tolerance
stub : a client side proxy that acts as a local representative of a remote object or service, handling network
communication and marshalling / unmarshalling of data, making remote calls appear local to the client.
Stubs simplify distributed communication by hiding the complexities of network interactions from the client
applications.

Remote Procedure
Calls
Basic RPC operation
Parameter Passing
RPC-based application support
Variations on RPC
Example: DCE RPC
Basic RPC Operation (1)
• Observations:
• Application developers are familiar
with simple procedure model
• Well-engineered procedures
operate in isolation (black box)
• There is no fundamental
reason
not to execute procedures on
separate machine

• Conclusion: Communication
between caller & callee can be
hidden by using procedure-call
mechanism.
Basic RPC Operation (2)
• Client procedure calls client stub.
• Client stub builds calls local OS.
• message;
Client OS sends to Remote OS.
• message
Server (Remote) OS gives message to
server stub.
• Server unpacks message and does a local
procedure call; returns result to server
stub.
• Server stub builds calls server
message;
OS.
• Server OS sends message to client OS.
• Client OS gives message to client stub.
• Client stub unpacks result; returns to
client procedure
DistributedSystems\4.Communication\rpc_dblist.py
RPC Parameter Passing (1)
• There's more than just wrapping parameters into a message
• Client and server machines may have different data representations (e.g.
little endian or big endian)
• Wrapping a parameter means transforming a value into a sequence of
bytes
• Client and server have to agree on the same encoding:
• How are basic data values represented (integers, floats,
characters)
• How are complex data values represented (arrays, unions)
• Conclusion: Client and server need to properly interpret messages,
transforming them into machine-dependent representations.
Examples of Serializable in Java
• Sharing object through binary file:
DistributedSystems\4.Communication\PersonSerialize.java
•Sending object over network:
DistributedSystems\4.Communication\Message.j
ava
DistributedSystems\4.Communication\MessageServ
er.java
DistributedSystems\4.Communication\MessageClien
Reference Code from https://gist.github.com/chatton/14110d2550126b12c0254501dde73616

t.java
RPC Parameter Passing (2)
• Some assumptions:
• Copy in/copy out semantics: while procedure is executed, nothing can
be assumed about parameter values.
• All data that is to be operated on is passed by parameters. Excludes
passing references to (global) data.

• Conclusion: Full access transparency cannot be


realized.mechanism enhances access transparency
• A remote reference
• Remote reference offers unified access to remote data
• Remote references can be passed as parameters in RPCs
• Note: stubs can sometimes be used as such references

DistributedSystems\4.Communication\rpc-dblist_param_marshalling.py
Parameter Passing in Object Based
System
RPC-based Application Support
• Stub Generation
void someFunction(char x; float y; int z[5])

• Forming a message format agreed by client and server


• Network protocol (TCP/UDP) to send and receive message
RPC Language-Based Support in
Java using Java Remote Method
Invocation
• (RMI)
Define Remote Interface:
DistributedSystems\4.Communication\RMIHello.java
• Implement Server:
DistributedSystems\4.Communication\RMIServer.jav
a
• Implement Client:
DistributedSystems\4.Communication\RMIClient.jav
a
• Compile: javac RMIHello.java RMIServer.java
RMIClient.java
• Start Registry inhttps://docs.oracle.com/javase/8/docs/technotes/guides/rmi/hello/hello-world.html#define
Example Source: Background: rmiregistry &
• Start Server in Background: java -classpath . -
RPC Language-Based Support in
Python using RPyC
RPyC Basic Example
• DistributedSystems\4.Communication\rpyc_client_basic.py
• DistributedSystems\4.Communication\rpyc_server_basic.py

• Rpyc DBList Example


• DistributedSystems\4.Communication\rpyc_dblist_client.py
• DistributedSystems\4.Communication\rpyc_dblist_server.py
Asynchronous RPC

• Try to get rid of the strict request- • Differed RPC: The client sends
reply behavior, but let the client requests to the RPC server and
continue without waiting for an upon receiving ack continues. The
answer from the server. Two cases: server sends the result when
• Client is not expecting any return ready. The client callback function
value executes to process the results.
• Client is expecting return value but
is
not willing to wait for result
• One-way RPC: After the client
sends a request, it does not
wait for ack. Causes problems
for unreliable network
connection
Implement asynchronous RPC yourself in the language of your choice
Multiple RPC
• Client sends requests to group of
servers using differed RPC or
one-way RPC
• Executes callback for each
request upon receiving the
results from each servers
• Assimilates the results to make
final
Example: Distributed Computing
Environment (DCE) RPC
• Developed by the Open Software Foundation (OSF)
• The basis for Microsoft's distributed computing environment (DCOM)
• Used in Samba – file server to access files from Windows file systems to
non-
Windows file systems
• Uses RPC protocol suite
Writing client
and server in
DCE RPC
Client to Server Binding (DCE)
Issues
(1) Client must locate server machine, and (2) locate the server
(i.e. server process).
Message
Oriented
Communicatio
Simple transient messaging with sockets (Socket Programming)
Advanced transient messaging (MPI)
Message-oriented persistent communication

n
Example: IBM’s WebSphere message-queuing system
Example: Advanced Message Queuing Protocol
(AMQP)
How does a specific application on the
client connect to a specific application
on the
• In this server?
example, the server is 3
different applications: a
database server, an email
(SMTP) server, a nd a server
(HTTP). web
• How will a client connect
through database UI to the
database server and from
the browser to the email
and web server? Image Source: https://aviadezra.blogspot.com/2008/07/code-sample-sending-typed-serialized.html
Hardware and Software
Organization of Socket
Interface
Internet From a Programmers View
1. Hosts are mapped to a set of 32-‐bit IPv4 addresses e.g. 128.2.203.179
2. The set of IP addresses is mapped to a set of identifiers called Internet domain names.
• 104.238.110.159 is mapped to www.daiict.ac.in
$ ping www.daiict.ac.in
PING www.daiict.ac.in (104.238.110.159) 56(84) bytes of data.
64 bytes from ip-104-238-110-159.ip.secureserver.net (104.238.110.159): icmp_seq=1 ttl=57
time=349 ms
^C
--- www.daiict.ac.in ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 349.388/349.388/349.388/0.000 ms
You can also use https://www.whatismyip.com/dns-lookup/ to get IP address mapped to domain
name.
3. A process on one Internet host can communicate with a process on another
Internet host over a Internet connection
(1) IP Addresses
• 32-‐bit IP addresses are stored in an IP address struct
• IP addresses are always stored in memory in network byte order (big-‐endian byte order)
• True in general for any integer transferred in a packet header from one machine to another.
• E.g., the port number used to identify an Internet connection.
/* Internet address structure */
struct in_addr {
uint32_t s_addr; /* network byte order (big-endian) */
};
• By convention, each byte in a 32-‐bit IP address is represented by its decimal value and
separated by a period
• IP address: 0x8002C2F2 = 128.2.194.242  Dotted Decimal Format
• Big-endian 1000 1001 1002 1003 Little- 1000 1001 1002
endian 1003
LSB 0xF2 0xC2 0x02 0x80 MSB LSB
• UseMSB 0x80
getaddrinfo and getnameinfo functions to convert between IP 0x02 and
addresses 0xC2 0xF2
doted decimal
format.
(2) Internet Domain Names
• The Internet maintains a mapping
between IP addresses and ac … in Top level domain names
domain names in a huge be
worldwide distributed database
called DNS
• Conceptually, programmers can co org … ac edu gov …
2nd level domain
view the DNS database as a names
collection of millions of host
entries.
• Each host entry defines the mapping daiict 3rd level domain
between a set of domain names and names
IP addresses.
• In a mathematical sense, a host entry
is an equivalence class of domain
names and IP addresses.
Properties of DNS Mappings
• Can explore properties of DNS • Multiple domain names mapped
mappings using nslookup to the same IP address:
• Each host has a locally defined nslookup cs.mit.edu and nslookup
domain name = localhost and IP eecs.mit.edu both returns the
= 127.0.0.1 same IP address
• Simple case: one-‐to-‐one Address: 18.25.0.23
mapping between domain name • Same domain names mapped
and IP address: to multiple IP addresses
$ nslookup www.daiict.ac.in $ nslookup www.netflix.com
Address: 20.198.80.43 Address: 3.251.50.149
Address: 54.74.73.31
Address: 54.155.178.5
(3) Internet Connections
• Clients and servers communicate by sending streams of bytes
over connections. Each connection is:
• Point‐to‐point: connects a pair of processes.
• Full-duplex: data can flow in both directions at the same time,
• Reliable: stream of bytes sent by the source is eventually received by the
destination
in the same
• A socket is an order it wasof
endpoint sent.
a
connection
• Socket address is an IPaddress:port pair
• A port is a 16-‐bit integer that identifies a
process:
• Ephemeral port: Assigned automatically by client kernel when client makes a
connection request.
• Well-known port: Associated with some service provided by a server (e.g., port 80 is
associated with Web servers)
Well-known Ports and Service
Names
• Popular services have permanently assigned well-‐known ports
and corresponding well-known service names:
• echo server: 7/echo
• ssh servers: 22/ssh
• email server: 25/smtp
• Web servers: 80/http
• File Transfer Protocol server : 21/ftp

• Mappings between well-‐known ports and service names is


in the file /etc/services on each Linux
contained
machine.
Anatomy of
Connection
•connection is uniquely identified by the socket addresses of its
endpoints (socket pair) :
• (clientIPaddr:clientport, serverIPaddr:serverport)
Using Ports to Identify Services
Socke
ts
What is a socket? • Clients and servers communicate
• To the kernel, a socket is an with each other by reading from
endpoint of communication (i.e. IP and writing to socket descriptors
address and port number)
• To an application, a socket is a file
descriptor that lets the application
read/write from/to the network
NOTE: All Unix I/O devices, • The main distinction between
including networks, are modeled as regular file I/O and socket I/O
files how
is the application “opens” the
socket descriptors
Socket Address Structures
• Internet-specific socket address IPv4: Must cast Internet-specific socket address IPv6
(struct sockaddr_in *) to (struct sockaddr *) for struct in6_addr {
functions that take socket address arguments.
struct in_addr { unsigned char
s6_addr[16]; /* IPv6
uint32_t s_addr; /* network byte order (big- };
address */
endian) */
}; struct sockaddr_in6 {
struct sockaddr_in { sa_family_t sin6_family; /* AF_INET6 */
uint16_t sin_family; /* Protocol family (always AF_INET) in_port_t sin6_port; /* port number */
*/
uint32_t sin6_flowinfo; /* IPv6 flow information */
uint16_t sin_port; /* Port num in network byte order */
struct in6_addr sin6_addr; /* IPv6 address */
struct in_addr sin_addr; /* IP addr in network byte order
uint32_t sin6_scope_id; /* Scope ID (new in 2.4) */
*/
unsigned char sin_zero[8]; /* Pad to sizeof(struct };
sockaddr)
*/
};
Host and Service Conversion:
getaddrinfo() Works for both IPv4
and
• IPv6
Given host simultaneously
and service, getaddrinfo returns result that points to a linked
list of addrinfo structs, each of which points to a corresponding socket
address struct, and which contains arguments for the sockets interface
functions.
int getaddrinfo(const char *host, /* Hostname or address */
const char *service, /* Port or service name */
const struct addrinfo *hints,/* Input parameters
(Filtering) */
struct addrinfo **result); /* Output linked list
*/
void freeaddrinfo(struct addrinfo *result); /* Free linked list */
const char *gai_strerror(int errcode); /* Return error msg from error
code */
Linked List returned by getaddrinfo()
• getaddrinfo is the modern way to
convert string representations of
hostnames, host addresses, ports,
and service names to socket
address structures.
• Advantages:
• Reentrant (can be safely used by
threaded programs).
• Allows us to write portable protocol-
independent code (works with IPv4
and IPv6 addresses)
• Disadvantages
• Somewhat complex
addrinfo
int structure
struct addrinfo {
ai_flags; /* Hints argument flags (AI_PASSIVE – used in server for passive TCP connection, AI_ADDRCONFIG –
used so that IPv4 or IPv6 any type of addresses can be used, AI_NUMERICSERV – used when providing numeric
value of port number*/
int ai_family; /* First arg to socket function (AF_INET or AF_INET6 or AF_UNSPEC) */
int ai_socktype; /* Second arg to socket function (SOCK_STREAM or SOCK_DGRAM or 0 means ANY)*/
int ai_protocol; /* Third arg to socket function (0 means ANY – generally only 1 protocol per family) */
char *ai_canonname; /* Canonical host name */
size_t ai_addrlen; /* Size of ai_addr struct */
struct sockaddr *ai_addr; /* Ptr to socket address structure */
struct addrinfo *ai_next; /* Ptr to next item in linked list */
};

• Each addrinfo struct returned by getaddrinfo contains arguments that can be passed directly to socket function.
• Also points to a socket address struct that can be passed directly to connect and bind
functions.
Host and Service Conversion:
getnameinfo()
int getnameinfo( • getnameinfo displays a socket
const struct sockaddr *sa, socklen_t salen, /* address to the corresponding
In: socket addr */
host (name or IP) and service
char *host, size_t hostlen, /* Out: host */ (service or port).
char *serv, size_t servlen, /* Out: service • Replaces obsolete gethostbyaddr
*/ and getservbyport funcs.
int flags); /* optional flags */ • Reentrant and protocol
flags = NI_NUMERICHOST | independent.
NI_NUMERICSERV;
/* Display address string instead of
domain name and port number instead of
service name */
getaddrinfo() example
DistributedSystems\4.Communication\hostinfo.c
If we disable line
#define IPv4 1, it will provide IPv4 as well as IPv6
addresses
$ ./hostinfo.out www.google.com
172.217.163.100
2404:6800:4007:811::2004

If we enable line
#define IPv4 1, it will provide
IPv4 addresses only
$ ./hostinfo.out www.google.com
172.217.163.100
Transient Messaging Through Socket
Interface

Operation Client/Server Description


socket Client and Server Create new communication end point (file descriptor)
bind Server Attach local socket address to end point
listen Server Specify maximum number of pending request that should be
handled by server socket
accept Server Server waits for new connect request here
connect Client Client actively attempts to set up connection
send/receive (python), Client and Server Exchange data between client and server
read/write (c)
Close Client and Server Close/release the connection
Implementation in C: Accept
Illistrated
Implementation in C: Connection vs
Listening Descriptors
• Listening descriptor (listenfd) • Server side listenfd is bound to
• End point for client connection requests server socket address for g
• Created once and exists for lifetime of the to clients connection requests.
server
listenin
• Connection descriptor (connfd) • Client side clientfd is bound to
• End point of the connection between server socket address to send
client receive data to server.
and
and server
• A new descriptor is created each time the • Server side connfd is bound to
server accepts a connection request from client socket address to send
a client
• Exists only as long as it takes to service
and receive data to client.
client
Test Your Own Echo Client Using Default
Echo Server on Port 7 (without using
your own
• First, make server)
sure echo service $ ./echoclient.out localhost 7
is running on port 7: host:127.0.0.1, service:7
$ cat /etc/services | grep 7/tcp message from our client to echo
echo 7/tcp server on port 7
If not you will have to install inetd message from our client to echo
service to have echo service on server on port 7  echoed
port 7 another test
another test  echoed
Our Echo Client Implementation: ^C
DistributedSystems\4.Communicatio
n\echoclient.c
Test Your Own Echo Server Using
Telnet (without using your own
client)
$ telnet 10.0.0.6 15020
Trying 10.0.0.6...
$ ./server.out 15020
Waiting for a new Client to connect
Connected to 10.0.0.6. Connected to (10.0.0.6, 56166)
Escape character is '^]'. Start Communication with Client
this is a test from telnet client server received 35 bytes
this is a test from telnet client server received message : this is a
howdy from telnet client test from telnet
client
howdy from telnet client
^]
server received 26 bytes
telnet> Connection closed. server received message : howdy
$ from telnet client

Our Echo Server Implementation: End Communication with Client


DistributedSystems\4.Communication\echoserver.c
Waiting for a new Client to connect
Echo Server Problem
• Echo Server is able to handle only 1 client connection at a time
because the main thread goes in loop unless client end the
connection
• How to fix it?
Python Implementation
https://docs.python.org/3/library/soc
ket.html
• DistributedSystems\4.Communication\echoserver.py
• DistributedSystems\4.Communication\echoclient.py
Sockets Disadvantages
•Sockets are rather low level and programming mistakes are easily
made.
• Sockets are generally used in synchronous client-server
communication
• What is the alternative?  ZeroMQ
Using messaging patterns: ZeroMQ
• ZeroMQ uses TCP i.e. connection-oriented communication
• ZeroMQ is an asynchronous messaging-based framework built on top
of sockets to provide additional functionality using sockets.
• In ZeroMQ, sockets may be bound to multiple addresses allowing
the server to handle messages from multiple sources
• ZeroMQ supports one-to-many and many-to-one communication.
ZeroMQ:
https://zeromq.org/languages/python/
or https://github.com/zeromq/pyzmq
• ZeroMQ Provides a higher level of expression by pairing sockets: one
for sending messages at process P and a corresponding one at
process Q for receiving messages.
• Install: pip install pyzmq
• Three patterns
• Request-reply: zmq_request_reply_server.py, zmq_request_reply_client.py
• Publish-subscribe: zmq_publish_subscribe_server.py,
zmq_publish_subscribe_client.py
• Pipeline: zmq_pipeline_master.py,
zmq_pipeline_worker.py
Message Passing
Interface (MPI) http
s://www.open-mpi.org/
• A message-passing library specification provides
• extended message-passing model
• not a language or compiler specification
• not a specific implementation or product
• Provides lots of flexibility
• Can be used for parallel computers, clusters or heterogeneous networks
• Supports TCP/IP but also private network

• Install in Ubuntu:
• sudo apt-get update && sudo apt-get install infiniband-diags ibverbs-utils \
libibverbs-dev
libfabric1 libfabric-dev libpsm2-dev –y
• sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev libgtk2.0-dev
• sudo apt-get install librdmacm-dev libpsm2-dev
MPI: Hello World
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

rank  a particular process/thread size


 total number of process/thread

DistributedSystems\4.Communication\
mpi_hello.c
How to Compile and Run
• mpicc - compiler for MPI
mpicc mpi_hello.c -o mpi_hello.out
• mpiexec or mpirun for executing the code
mpiexec -np 4 ./mpi_hello.out  runs 4 process

mpiexec -np 6 ./mpi_hello.out  if you give the process more than a


number of cores available then we get an error as below:
"There are not enough slots available in the system to satisfy the 6
slots that were requested by the application"
MPI – Point to Point Communication
• DistributedSystems\4.Communication\mpi_mm.c
Operation Description
MPI bsend Append outgoing message to a local send buffer (transient
asynchronous)
MPI send Send a message and wait (blocking) until copied to local or .DistributedSystems\4.Communication
remote buffer \mpi_blocking-p2p.c
MPI ssend Send a message and wait until transmission starts
MPI sendrecv Send a message and wait for reply (similar to RPC)
MPI isend Pass reference to outgoing message, and continue DistributedSystems\4.Communication\
mpi_non-blocking-p2p.c
MPI issend Pass reference to outgoing message, and wait until receipt starts
MPI recv Receive a message; block if there is none .DistributedSystems\4.Communication
\mpi_blocking-p2p.c
MPI irecv Check if there is an incoming message, but do not block DistributedSystems\4.Communication\
mpi_non-blocking-p2p.c
MPI – Collective Communication
• DistributedSystems\4.Communication\mpi_pi.c

Operation Description
MPI_Bcast Broadcasts a message from the process with rank root to all DistributedSystems\4.Commu
other processes of the group. nication\mpi_Broadcast.c
MPI_Reduce Reduces values on all processes within a group. DistributedSystems\4.Comm
unication\mpi_reduce.c
MPI_Scatter Sends data from one task to all tasks in a group. DistributedSystems\4.Com
munication\mpi_scatter.c
MPI_Gather Gathers values from a group of processes. DistributedSystems\4.Commu
nication\mpi_gather.c
Message Queuing o Message-
(MQ)
Middleware r Oriented
(MOM) persistent communication through the support of
•Asynchronous
middleware-level queues. Queues correspond to buffers at
communication servers.
Basic Operations
Operation Description
put Append a message to a specified queue
get Block until the specified queue is nonempty, and remove the
first message
poll Check a specified queue for messages, and remove the
first. Never block
notify Install a handler to be called when a message is put into the
specified queue
General Model
• Queue managers: Queues are managed by queue managers. An

message is possible by extracting it from a local queue only ⇒


application can put messages only into a local queue. Getting a

queue managers need to route


• Routing
Message Broker
• Observation: Message queuing systems assume a common messaging
protocol: all applications agree on message format (i.e., structure
and data representation)
• Broker handles application heterogeneity in an MQ system:
• Transforms incoming messages to target format
• Very often acts as an application gateway
• May provide subject-based routing capabilities (i.e., publish-subscribe
capabilities)
Message Broker General
Architecture
IBM's WebSphere MQ
• Basic concepts
• Application-specific messages are put into, and removed from queues
• Queues reside under the regime of a queue manager
• Processes can put messages only in local queues, or through an RPC mechanism
• Message transfer
• Messages are transferred between queues
• Message transfer between queues at different processes, requires a channel
• At each end point of channel is a message channel agent
• Message channel agents are responsible for:
• Setting up channels using lower-level network communication facilities (e.g., TCP/IP)
• (Un)wrapping messages from/in transport-level packets
• Sending/receiving packets
IBM's WebSphere MQ: Schematic
Overview
• Channels are
inherently
unidirectional
• Automatically start
MCAs when
messages arrive
• Any network of
queue managers can
be created
• Routes are set up
manually (system
administration)
Message Channel Agents
• Some attributes associated with message channel agents

Attribute Description
Transport type Determines the transport protocol to be used
FIFO delivery Indicates that messages are to be delivered in the order they are sent
Message length Maximum length of a single message
Setup retry count Specifies maximum number of retries to start up the remote MCA
Delivery retries Maximum times MCA will try to put received message into queue
IBM's WebSphere MQ: Routing
• By using logical names, in combination with name resolution to local
queues, it is possible to put a message in a remote queue
Multicast Communication
• Application-level tree-based multicasting
• Flooding-based multicasting
• Gossip-based data dissemination
Application-level Multicasting
• Organize nodes of a distributed system into an overlay network
and use that network to disseminate data:
• Oftentimes a tree, leading to unique paths
• Alternatively, also mesh networks, requiring a form of routing
Application-level Multicasting in
Chord
• Initiator generates a multicast identifier mid .
• Lookup succ(mid), the node responsible for mid .
• Request is routed to succ(mid ), which will become the root.
• If P wants to join, it sends a join request to the root.
• When request arrives at Q:
• Q has not seen a join request before ⇒ it becomes forwarder; P

• Q knows about tree ⇒ P becomes child of Q. No need to forward join


becomes child of Q. Join request continues to be forwarded.

request anymore.
ALM: Some Costs
• Different Metics:
• Link stress: How often does an ALM message cross the same physical link?
Example: message from A to D needs to cross (Ra, Rb) twice.
• Stretch: Ratio in delay between ALM-level path and network-level path.

level ⇒ stretch =
Example: messages B to C follow path of length 73 at ALM, but 47 at
network End Host
73/47. A 1
Floodi
ng The size of a random
• P simply sends a message m
to overlay as function of
each of its neighbors. Each the number of nodes
neighbor will forward that
message, except to P, and
only if
it had not seen m before.
• Performance: The more edges,
the more expensive!
Variation
Let Q forward a message with a certain
probability pflood , possibly even dependent on
Epidemic protocols
• Assume there are no write–write conflicts
• Update operations are performed at a single server
• A replica passes updated state to only a few neighbors
• Update propagation is lazy, i.e., not immediate
• Eventually, each update should reach every replica

• Two forms of epidemics


• Anti-entropy: Each replica regularly chooses another replica at random, and
exchanges state differences, leading to identical states at both afterwards
• Rumor spreading: A replica which has just been updated (i.e., has been
contaminated), tells a number of other replicas about its update
(contaminating them as well).
Anti-entropy
• Principle operations:
• A node P selects another node Q from the system at random.
• Pull: P only pulls in new updates from Q
• Push: P only pushes its own updates to Q
• Push-pull: P and Q send updates to each other

• Observation: For push-pull it takes O(log(N)) rounds to disseminate


updates to all N nodes (round = when every node has taken the
initiative to start an exchange).
Anti-entropy: Analysis
• Basics: Consider a single source, propagating its update. Let pi be the
probability that a node has not received the update after the ith
round.

• With pull, 𝑝𝑖+1 = (𝑝𝑖)2: the node was not updated during the ith
Analysis: staying ignorant

• With push, 𝑝𝑖+1 = 𝑝 (1 1 )(𝑁−1)(1−𝑝)≈ 𝑝𝑖 𝑒−1 (for small 𝑝and large


round and should contact another ignorant
𝑁−round.

N): the node 𝑖was ignorant during the ith round 𝑖and no updated
node during the next
1
node chooses to contact it during the next round.
• With push-pull: (𝑝𝑖)2 (𝑝𝑖𝑒−1)
Anti-entropy: Performance
Rumor Spreading
• A server S having an update to report, contacts other servers. If a
server is contacted to which the update has already propagated,
S stops contacting other servers with probability pstop.
• Observation: If s is the fraction of ignorant servers (i.e., which are
unaware of the update), it can be shown that with many
1
− +1 (1
𝑠 = −𝑠)𝑝 𝑠𝑡𝑜𝑝
servers

𝑒
Formal Analysis
Rumor Spreading: The effect of
stopping
Note
If we really have to ensure that all servers are eventually updated,
rumor
spreading alone is not enough
1/pstop
s Ns
1 0.203188 2032
2 0.059520 595
3 0.019827 198
4 0.006977 70
5 0.002516 25
6 0.000918 9
7 0.000336 3
Deleting Values
• Fundamental problem: We cannot remove an old value from a server
and expect the removal to propagate. Instead, mere removal will
be undone in due time using epidemic algorithms
• Solution: Removal has to be registered as a special update by
inserting a death certificate
Deleting Values
• When to remove a death certificate (it is not allowed to stay for ever)
• Run a global algorithm to detect whether the removal is known everywhere,
and then collect the death certificates (looks like garbage collection)
• Assume death certificates propagate in finite time, and associate a maximum
lifetime for a certificate (can be done at risk of not reaching all servers)

• Note: It is necessary that a removal actually reaches all servers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy