Destributed System Lecture Note Finale
Destributed System Lecture Note Finale
1 Introduction
▶ Introduction
▶ Processes
▶ Communication
▶ Coordination
▶ Distributed programming
2
Introduction
Distributed systems
• Definitions
• Design goals
• Classification
• Pitfalls
Basic Concepts
1 Introduction
4
Basic Concepts
1 Introduction
Definition 2:
A distributed system is a networked computer system in which processes and
resources are sufficiently spread across multiple computers ⇐ Expansive view
Examples
• Google (Search, Mail, etc.)
• Finance and commerce (Banks, Amazon, eBay, etc.)
• Content Delivery Network (CDN)
• Telecommunication
5
Design Goals
1 Introduction
6
Distribution transparency
Design Goals
The distribution of processes and resources is transparent, that is, invisible, to end users
and applications.
Distribution transparency
Transparency Description
Access Hide differences in data representation and how an object is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another location while in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several independent users
Failure Hide the failure and recovery of an object
7
Degree of distribution transparency
Distribution transparency
Example
Video streams (failure to access server)
• How to hide transmission delays for wide-area distributed systems?
• How to distinguish slow system from failing one?
Note:
Distribution transparency is a nice a goal, but achieving it is a different story.
8
Openness
Design Goals
The system communicates with services of other systems irrespective of the underlying
environment.
Openness
• Interoperability – different manufacturers can co-exist and work together by a
common standard.
• Portability – an application can be executed on different distributed system without
modification.
• Extensible – adding new components or replacing existing ones without affecting the
other components.
9
Separating versus mechanism
openness
10
Dependability
Design Goals
Requirements
• Availability – the probability of operating correctly at any given moment.
• Reliability – continue to work without interruption.
• Safety – no catastrophic event happens during temporary fails.
• Maintainability – how easily a failed system can be repaired.
11
Scalability
Design Goals
Scalability
Three components:
• Size scalability – add users/resources without noticeable loss of performance.
• Geographical scalability – users and resources may lie far apart, but communication
delay is hardly noticed.
• Administrative scalability – can be easily managed even if it spans many
independent administrative organizations.
12
Scaling techniques
Scalability
Scaling out
• Hiding communication latencies – avoid waiting for responses to remote-service
requests as much as possible. However, the approach doesn’t fit for every
application.
• Partitioning and distribution – partition data and computations across multiple
machines. Example: World Wide Web documents
• Replication – availability, load balance and latency
— Problem: consistency
13
Classification of distributed systems
1 Introduction
14
Cluster computing cluster computing plays a crucial role in distributed systems by enabling efficient
Distributed computing systems resource utilization, parallel processing, fault tolerance, and scalability.
Cluster computing
• Homogeneous: same OS, near-identical hardware
• Single managing node
15
focuses on a single organization's resources
Grid computing Grid computing in distributed systems enables efficient resource sharing, collaboration,
Distributed computing systems and salability across organizational boundaries, making it a powerful paradigm for
tackling large-scale computational challenges.
Grid computing
• Fabric layer – shared heterogeneous resources
• Connectivity layer – communication and security
protocols to authenticate and transfer data
• Resource layer – protocols for operating on a single
shared resources. E.g., get configuration, create
process
• Collective layer – coordinates sharing of resources.
E.g., allocation and scheduling of tasks onto multiple
resources
• Application layer – applications running on the gird.
16
Cloud computing It provides a scalable and flexible computing model, enabling efficient resource
Distributed computing systems utilization and empowering users to focus on their core business or tasks without
the burden of infrastructure management.
Cloud computing
Four layers
• Hardware – physical storage, processors, network
devices, etc.
• Infrastructure – provides virtually unlimited “raw”
computing, storage, and network resources
• Platform – set of tools or middleware that are used to
develop or deploy applications on the cloud
• Application – running applications.
17
Distributed information systems
Classification
Transaction primitives
Primitive Description
BEGIN_TRANSACTION Mark the start of a transaction
END_TRANSACTION Terminate the transaction and try to commit
ABORT_TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise
18
Transaction processing
Distributed information systems
Properties of transactions
Transactions adhere to the so-called ACID properties:
• Atomicity – All operations either succeed, or all of them fail. When the transaction
fails, the state of the object will remain unaffected by the transaction.
• Consistency – A transaction establishes a valid (predefined) state transition, i.e.,
corruption or errors in your data do not create unintended consequences for the
integrity of your table.
• Isolation – Concurrent transactions do not interfere with each other. Each request
occur as though they were occurring one by one.
• Durability – After the execution of a transaction, its effects are made permanent,
i.e., changes to the state survive failures.
19
Distributed pervasive systems
Classification
Emerging next-generation of distributed systems in which nodes are small, mobile, and
often embedded in a larger system, characterized by the fact that the system naturally
blends into the user’s environment.
Pervasive systems
Types (overlapping characters):
• Ubiquitous computing systems
• Mobile computing systems
• Sensor networks
20
Ubiquitous computing systems
Pervasive systems
Pervasive and continuously present, i.e., user will be continuously interacting with the
system .
Core requirements
• Distribution – Devices are networked, distributed, and accessible transparently
(hidden from view)
• Interaction – Interaction between users and devices is highly unobtrusive (implicit)
• Context awareness – The system is aware of a user’s context to optimize interaction
• Autonomy – Devices operate autonomously without human intervention, and are
thus highly self-managed
• Intelligence – The system as a whole can handle a wide range of dynamic actions and
interactions (AI)
21
Mobile computing systems
Pervasive systems
Pervasive, but emphasis is on the fact that devices are inherently mobile.
22
Sensor networks
Pervasive systems
Pervasive, with emphasis on the actual (collaborative) sensing and actuation of the
environment.
Sensor networks
Characters of sensor networks:
• 10s - 1000s of small nodes, each equipped with one or more sensing devices.
• Wireless and often battery powered
• Limited resources, i.e., small memory/compute/communication capacity, which is an
advantage for the power consumption.
23
Sensor networks as distributed system
Pervasive systems
Sensor networks
24
Pitfalls
1 Introduction
False assumptions
• The network is reliable
• The network is secure
• The network is homogeneous
• The topology does not change
• Bandwidth is infinite
• Transport cost is zero
• There is one administrator
25
Summary
1 Introduction
• The goal of distributed system is to spread processes and resources across different
computers for sufficiency, not necessity.
• Design goals for distributed systems include sharing resources, ensuring openness,
distribution transparency, and scalability.
• Different types of distributed systems exist which can be classified as being oriented
toward supporting computations, information processing, and pervasiveness.
26
Distributed Systems
2 Processes
▶ Introduction
▶ Processes
▶ Communication
▶ Coordination
▶ Distributed programming
27
Processes
Distributed systems
• Threads
• Virtualization
• Clients and Servers
• Code Migration
Threads
2 Processes
Basic concepts
• Processor – Provides a set of instructions along with the capability of automatically
executing a series of those instructions.
• Thread – A minimal software processor in whose context a series of instructions can
be executed. Saving a thread context implies stopping the current execution and
saving all the data needed to continue the execution at a later stage.
• Process – A software processor in whose context one or more threads may be
executed. Executing a thread, means executing a series of instructions in the context
of that thread.
29
Context switching
Threads
Context switching
• Threads share the same address space. Thread context switching can be done
entirely independent of the operating system.
• Creating and destroying threads is much cheaper than doing so for processes.
30
Context switching
Threads
Context switching
• Threads use the same address space: more prone to errors
• No support from OS/HW to protect threads using each other’s memory
• Thread context switching may be faster than process context switching
31
Python example
2 Processes
1 from m u l t i p r o c e s s i n g i m p o r t P r o c e s s
2 from t i m e i m p o r t *
3 from random i m p o r t *
4
5 d e f s l e e p e r ( name ) :
6 t = gmtime ( )
7 s = randint (1 ,20)
8 t x t = s t r ( t . tm_min ) + ’ : ’ + s t r ( t . tm_sec ) + ’ ’ +name+ ’ i s g o i n g t o s l e e p f o r ’ + s t r ( s ) + ’ s e c o n d s ’
9 print ( txt )
10
11 sleep ( s )
12 t = gmtime ( )
13 t x t = s t r ( t . tm_min ) + ’ : ’ + s t r ( t . tm_sec ) + ’ ’ +name+ ’ h a s woken up ’
14 print ( txt )
15
16 i f __name__ == ’ __main__ ’ :
17 p = P r o c e s s ( t a r g e t = s l e e p e r , a r g s = ( ’ eve ’ , ) )
18 q = P r o c e s s ( t a r g e t = s l e e p e r , a r g s = ( ’ bob ’ , ) )
19 p. start () ; q. start ()
20 p. join () ; q. join ()
Output:
46:9 eve is going to sleep for 4 seconds
46:9 bob is going to sleep for 13 seconds
46:13 eve has woken up
46:22 bob has woken up
32
Python example
Threads
33
Python example
2 Processes
21 d e f s l e e p e r ( name ) :
22 p r i n t ( name+ ’ s e e s s h a r e d x b e i n g ’ + s t r ( s h a r e d _ x ) )
23 sleeplist = list ()
24 f o r i i n range ( 3 ) :
25 s u b s l e e p e r = T h r e a d ( t a r g e t = s l e e p i n g , a r g s = ( name+ ’ ’ + s t r ( i ) , ) )
26 s l e e p l i s t . append ( s u b s l e e p e r )
27
28 for s in s l e e p l i s t : s . start ( )
29 for s in s l e e p l i s t : s . join ( )
30
31 p r i n t ( name , ’ s e e s s h a r e d x b e i n g ’ , s h a r e d _ x )
Output:
eve sees shared x being 68 48:19 bob 2 has woken up, seeing shared x being 69
bob sees shared x being 68 48:24 eve 1 has woken up, seeing shared x being 69
48:14 eve 0 is going to sleep for 19 seconds 48:25 bob 1 has woken up, seeing shared x being 70
48:14 eve 1 is going to sleep for 10 seconds 48:25 eve 2 has woken up, seeing shared x being 70
48:14 eve 2 is going to sleep for 11 seconds 48:32 bob 0 has woken up, seeing shared x being 71
48:14 bob 0 is going to sleep for 18 seconds bob sees shared x being 71
48:14 bob 1 is going to sleep for 11 seconds 48:33 eve 0 has woken up, seeing shared x being 71
48:14 bob 2 is going to sleep for 5 seconds eve sees shared x being 71
34
Threads and operating systems
Threads
35
Threads and operating systems
Threads
Kernel solution:
The whole idea is to have the kernel contain the implementation of a thread package,
i.e., thread operation (creation, deletion, synchronization, etc.) require a system call.
Kernel solution
• Operations that block a thread are no longer a problem: the kernel schedules
another available thread within the same process.
• The problem is the loss of efficiency because each thread operation requires a trap
to the kernel (context switching becomes expensive).
36
Threads and Distributed Systems
Threads
Improve performance
• Having a single-threaded server prohibits simple scale-up to a multiprocessor
system.
• As with clients: hide network latency by reacting to next request while previous one
is being replied.
• Starting a thread is much cheaper than starting a new process.
38
Threads and Distributed Systems
Threads
39
Virtualization
2 Processes
40
Mimicking interfaces
Virtualization
Mimicking interfaces
Four types of interfaces at three different levels:
1. Instruction set architecture: the set of machine instructions, with two subsets:
— Privileged instructions: allowed to be executed only by the operating system.
— General instructions: can be executed by any program.
2. System calls as offered by an operating system.
3. Library calls, known as an application programming interface (API)
41
Ways of virtualization
Virtualization
Ways of virtualization
1. Process VM – Instructions can be interpreted (as is the case for the Java runtime
environment).
2. Native VMM – mimics the instruction set of directly on the hardware ⇒ a complete
operating system and its applications can be supported
3. Hosted VMM – Low-level instructions, but delegating most work to a full-fledged OS
(Example: VMware, VirtualBox).
42
Containers
Virtualization
Containers
• Reduced instance of virtualization
• A container holds only the necessary OS
components (binaries/images/libraries) that are
needed for that specific application to run.
• Virtualizes the software environment for an
application.
• Applications and processes operating in different
containers need to be isolated from each other
43
Client-server
2 Processes
Interaction
• Application-level – A networked application with its own protocol. Example: calendar
synchronization.
• Direct remote access – A general solution to allow access to remote applications, i.e.,
server provide convenient user interface.
44
Client-side software
Client-server
• Failure transparency – client can mask server and communication failures. Example:
repeatedly attempt to connect to a server, or try another server
45
Servers
Client-server
General organization
A process implementing a specific service on behalf of a collection of clients. It waits
for an incoming request from a client and subsequently ensures that the request is
taken care of, after which it waits for the next incoming request.
46
Out-of-band communication
Client-server
Is it possible to interrupt a server once it has accepted (or is in the process of accepting) a
service request?
Solution 1: Use a separate port for urgent data
• Server has a separate thread/process for urgent messages
• Urgent message comes in ⇒ associated request is put on hold
• Note: we require OS supports priority-based scheduling
47
Servers and state
Client-server
Stateless servers
• Do not keep information about the status of a client after having handled a request.
• No disruption of the service offered by the server if information is lost.
• Soft state: maintain state on behalf of the client, but only for a limited time.
Stateful servers
• Keeps track of the status of its clients
• Record that a file has been opened, so that prefetching can be done
• Knows which data a client has cached, and allows clients to keep local copies of
shared data ((client, file) table).
The performance of stateful servers can be extremely high, provided clients are allowed
to keep local copies.
48
Server clusters
Client-server
Note: The first tier is generally responsible for passing requests to an appropriate server:
request dispatching.
49
Request Handling
Client-server
TCP handoff
Having the first tier handle all communication from/to the cluster may lead to a
bottleneck.
50
Code Migration
2 Processes
Performance
• Ensuring that servers in a data center are sufficiently loaded (e.g., to prevent waste
of energy) ⇒ Load distribution/algorithms
• Minimizing communication by ensuring that computations are close to where the
data is (think of MEC).
51
Flexibility
Code Migration
Flexibility
• Moving code to a client when needed (design flexibility)
• dynamically moving code requires a protocol for downloading and initializing.
• downloaded code should be executable on the client’s machine.
Avoids pre-installing software and increases dynamic configuration.
52
Privacy and security
Code Migration
53
Models for code migration
Code Migration
54
Strong and weak mobility
Code Migration
Weak mobility
Move only code and data segment (a transferred program is always started anew):
• Relatively simple, especially if code is portable
• Distinguish code shipping (push) from code fetching (pull)
Strong mobility
Move component, including execution state (execution segment can be transferred)
• Migration – move entire object from one machine to the other
• Cloning – start a clone, and set it in the same execution state. Example: fork()
55
Heterogeneous systems
Code Migration
Main problem
• The target machine may not be suitable to execute the migrated code
• The definition of process/thread/processor context is highly dependent on local
hardware, operating system and runtime system
Solution
Migrate not only processes, but to migrate entire computing environments.
• Interpreted languages, effectively having their own VM
• Virtual machine monitors (think of virtual machines)
56
Distributed Systems
3 Communication
▶ Introduction
▶ Processes
▶ Communication
▶ Coordination
▶ Distributed programming
57
Communication
Distributed systems
59
Top layers
Layered protocols
60
Transport layers
Layered protocols
The transport layer provides the actual communication facilities for most distributed
systems. It determines how data should be delivered.
Standard Internet protocols
• TCP – connection-oriented, reliable, stream-oriented communication
• UDP – unreliable (best-effort) datagram communication
61
Low-level layers
Layered protocols
Recap
• Physical layer – contains the specification and implementation of bits, and their
transmission between sender and receiver.
• Data link layer – prescribes the transmission of a series of bits into a frame to allow
for error and flow control
• Network layer – describes how packets in a network of computers are to be routed
(i.e., path determination).
62
Types of communication
3 Communication
Distinguish
• Transient versus persistent communication
• Asynchronous versus synchronous communication
63
Types of communication
3 Communication
67
Messaging
Types of communication
Message-oriented middleware
Aims at high-level persistent asynchronous communication:
• Processes send each other messages, which are queued
• Sender need not wait for immediate reply, but can do other things
68
Remote Procedure Call (RPC)
3 Communication
69
Basic RPC operations
Remote Procedure Call
• Application developers are familiar with simple procedure model
• Well-engineered procedures operate in isolation (black box)
• Client and server stubs pack the parameters into a message and request that
message to be sent
1. Client procedure calls client stub. 6. Server makes local call and returns result to stub.
2. Stub builds message; calls local OS. 7. Stub builds message; calls OS.
3. OS sends message to remote OS. 8. OS sends message to client’s OS.
4. Remote OS gives message to stub. 9. Client’s OS gives message to stub.
5. Stub unpacks parameters and calls server. 10. Client stub unpacks result and returns to the client.
71
Parameter passing
Remote Procedure Call
Parameter marshaling
There’s more than just wrapping parameters into a message:
• Client and server need to properly interpret messages, transforming them into
machine-dependent representations. (e.g., ordering)
• Client and server have to agree on the same encoding:
— How are basic data values represented (integers, floats, characters) and
— How are complex data values represented (arrays)
Stub generation
Define protocols (message formats, data representations, delivery method)
72
Parameter passing
Remote Procedure Call
Support
• Forbid pointers and reference parameters
• Copy the entire data structure to which the parameter is referring, effectively
replacing the copy-by-reference mechanism by copy-by-value
• Global references: client and the server have access to the same file system
73
Asynchronous RPCs
Remote Procedure Call
Essence
Try to get rid of the strict request-reply behavior, but let the client continue without
waiting for an answer from the server.
• No result to return
• Multiple RPCs need to be performed
74
Sending out multiple RPCs
Remote Procedure Call
Sending an RPC request to a group of servers.
Consideration
• Client may be unaware of multiple servers existence. Example: fault tolerance
system using multicast address
• Client proceed after one or all responses have been received?
75
Message-Oriented Communication
3 Communication
• Transient Messaging
• Message-Queuing System
• Message Brokers
76
Transient messaging: sockets
Message-Oriented Communication
77
Queue-based messaging
Message-Oriented Communication
Four possible combinations:
78
Message-oriented middleware
Message-Oriented Communication
Essence
Asynchronous persistent communication through support of middleware-level queues.
Queues correspond to buffers at communication servers.
79
General model
Message-Oriented Communication
Queue managers
Queues are managed by queue managers. An application can put messages only into a
local queue. Getting a message is possible by extracting it from a local queue only ⇒
queue managers need to route messages.
Routing – special queue managers that forward incoming messages to other queue
managers.
80
Message broker
Message-Oriented Communication
Message queuing systems assume a common messaging protocol: all applications agree
on message format (i.e., structure and data representation)
Message broker
Centralized component that takes care of application heterogeneity in an MQ system:
• Transforms incoming messages to target format
• Very often acts as an application gateway
81
Message broker
Message-Oriented Communication
General architecture
82
Application-level multicasting
Multicast communication
Organize nodes of a distributed system into an overlay network and use that network to
disseminate data.
• Link stress – is defined per link and counts how often a packet crosses the same
physical link? Example: message from A to D needs to cross 〈Ra, Rb〉 twice.
• Stretch – ratio in delay between ALM-level path and network-level path. Example:
messages B to C follow path of length 73 at ALM, but 47 at network level ⇒ stretch =
73/47.
83
Flooding
Multicast communication
Rather than broadcasting, multicast by minimize the use of intermediate nodes for which
the message is not intended.
• Construct an overlay network per multicast group.
— A node needs to maintain separate list of neighbors if it belongs to several group.
Flooding
For an overlay corresponding to a multicast group, we need to broadcast a message.
• P simply sends a message m to each of its neighbors. Each neighbor will forward that
message, except to P, and only if it had not seen m before.
84
Flooding performance
Multicast communication
Overlay network:
• G = (V, E)
• For undirected graph,
Mtot = δ(vo ) + Σv∈V −vo (δ(v) − 1) ≈ 2|E| − |V | + 1 , where δ(v) is the number of
neighbors of node v.
Performance
For fully connected graph:
|V |
• We have |E| = 2 leading to an order of |V |2 messages ⇒ O(N 2 )
85
Probabilistic approach
Multicast communication
Assumption: no information on the structure of the overlay network.
• Random graph representation – a graph having a probability pedge that two vertices
are joined by an edge.
• |E| = pedge · |V | · (|V | − 1) · /2
86
Epidemic protocols
Multicast communication
87
Anti-entropy
Multicast communication
Principle operations
A node P selects another node Q from the system at random.
• Push – P only pushes its own updates to Q
• Pull – P only pulls in new updates from Q
• Push-pull – P and Q send updates to each other
For push-pull it takes O(log(N )) rounds to disseminate updates to all N nodes (round ⇒
when every node as taken the initiative to start an exchange).
88
Gossip-based data dissemination
Multicast communication
Principle operations
A server S having an update to report, contacts other servers. If a server is contacted to
which the update has already propagated, S stops contacting other servers with
probability pstop .
If s is the fraction of ignorant servers (i.e., which are unaware of the update), it can be
shown that with many servers
s = e−(1/pstop +1)(1−s)
Note: it cannot guarantee that all nodes will actually be updated.
89
Effect of stopping
Multicast communication
90
Distributed Systems
4 Coordination
▶ Introduction
▶ Processes
▶ Communication
▶ Coordination
▶ Distributed programming
91
Coordination
Distributed systems
• Logical clocks
• Mutual exclusion
• Election algorithm
Clock synchronization
4 Coordination
Centralized systems – time is unambiguous, i.e., system clock keeps time and all entities
can use that
Distributed systems
Each node has it’s own clock
• Problem: an event that occurring after the other may be assigned an earlier time.
Example: make
93
Physical clock
4 Coordination
94
Clock synchronization
4 Coordination
Clock drift
• Each clock has a maximum drift rate ρ
• Two clocks may drift by 2ρ
• Limit drift rate with δ
— re-synchronize every δ/2ρ
95
Network Time Protocol
4 Coordination
When contacting the server, message delays will have outdated the reported time.
Estimation
• Offset θ (assumption: δTreq = T 2 − T 1 ≈ T 4 − T 3 = Tres )
— θ = T 3 + ((T 2 − T 1) + (T 4 − T 3))/2 + T 4 = ((T 2 − T 1) + (T 3 − T 4))/2
◦ if θ < 0, clock is set backward
• Delay δ
— δ = ((T 4 − T 1) − (T 3 − T 2))/2
96
Lamport’s logical clocks
4 Coordination
What usually matters is not that all processes agree on exactly what time it is, but that
they agree on the order in which events occur. Requires a notion of ordering.
The happened-before relation
• If a and b are two events in the same process, and a comes before b, then a → b.
• If a is the sending of a message, and b is the receipt of that message, then a → b.
• If a → b and b → c, then a → c.
This introduces a partial ordering of events in a system with concurrently operating
processes.
97
Logical clocks
4 Coordination
How do we maintain a global view of the system’s behavior that is consistent with the
happened-before relation?
The notion of time
Attach a timestamp C(e) to each event e, satisfying the following properties:
P1 If a and b are two events in the same process, and a → b, then we demand that
C(a) < C(b).
P2 If a corresponds to sending a message m, and b to the receipt of that message, then
also C(a) < C(b).
Problem: How to attach a timestamp to an event when there’s no global clock ⇒
maintain a consistent set of logical clocks, one per process.
98
Logical clocks
4 Coordination
Solution
Each process Pi maintains a local counter Ci and adjusts this counter
1. For each new event that takes place within Pi , Ci is incremented by 1.
2. Each time a message m is sent by process Pi , the message receives a timestamp
ts(m) = Ci.
3. Whenever a message m is received by a process Pj , Pj adjusts its local counter Cj
to max{Cj , ts(m)} + 1.
Notes:
• Property P1 is satisfied by (1); Property P2 by (2) and (3).
• It can still occur that two events happen at the same time. Avoid this by breaking ties
through process IDs.
99
Example
Logical clocks
100
Implementation
Logical clocks
Note
Adjustments take place in the middleware layer
101
Example: Totally ordered multicast
Logical clocks
Problem
Concurrent updates on a replicated database are seen in the same order everywhere
• P1 adds $100 to an account (initial value: $1000)
• P2 increments account by 1%
• There are two replicas
Result
In absence of proper synchronization: replica #1 ← $1111, while replica #2 ← $1110
(propagation delay).
102
Example: Totally ordered multicast
Logical clocks
Solution
• Process Pi sends timestamped message mi to all others. The message itself is put in
a local queue Qi .
• Any incoming message at Pj is queued in Qj , according to its timestamp, and
acknowledged to every other process.
103
Vector clocks
4 Coordination
Observation
Lamport’s clocks do not guarantee that if C(a) < C(b) that a causally preceded b
Obs.
Event a: m1 is received at T = 16;
Event b: m2 is sent at T = 20.
Note
We cannot conclude that a causally precedes b.
104
Vector clocks
4 Coordination
Definition
We say that b may causally depend on a if ts(a) < ts(b), with:
• for all k, ts(a)[k] ≤ ts(b)[k] and
• there exists at least one index k ′ for which ts(a)[k ′ ] < ts(b)[k ′ ]
105
Causal dependency
Vector clocks
Definition
We say that b may causally depend on a if ts(a) < ts(b), with:
• for all k, ts(a)[k] ≤ ts(b)[k] and
• there exists at least one index k ′ for which ts(a)[k ′ ] < ts(b)[k ′ ]
106
Capturing potential causality
Vector clocks
Solution
Each Pi maintains a vector V Ci
• V Ci [i] is the local logical clock at process Pi .
• If V Ci [j] = k then Pi knows that k events have occurred at Pj .
108
Causally ordered multicasting
4 Coordination
We can now ensure that a message is delivered only if all causally preceding messages
have already been delivered.
Adjustment
Pi increments V Ci [i] only when sending a message, and Pj "adjusts" V Cj when
receiving a message (i.e., effectively does not change V Cj [j]).
109
Causally ordered multicasting
4 Coordination
110
Mutual exclusion
4 Coordination
Basic solutions
• Permission-based – a process wanting to enter its critical region, or access a
resource, needs permission from other process(es).
• Token-based – a token is passed between processes. The one who has the token may
proceed in its critical region, or pass it on when not interested.
111
Centralized
Mutual exclusion
(a) Process P1 asks the coordinator for permission to access a shared resource.
Permission is granted.
(b) Process P2 then asks permission to access the same resource. The coordinator does
not reply.
(c) When P1 releases the resource, it tells the coordinator, which then replies to P 2 .
Note: re-election during failure.
112
Distributed: Ricart & Agrawala
Mutual exclusion
Principle
The same as Lamport/total ordering except that acknowledgments aren’t sent. Instead,
replies (i.e. grants) are sent only when
• The receiving process has no interest in the shared resource; or
• The receiving process is waiting for the resource, but has lower priority (known
through comparison of timestamps).
In all other cases, reply is deferred, implying some more local administration.
113
Example
Ricart & Agrawala
(a) Two processes want to access a shared resource at the same moment.
(b) P0 has the lowest timestamp, so it wins.
(c) When process P0 is done, it sends an OK also, so P2 can now go ahead.
114
Token ring algorithm
Mutual exclusion
Essence
Organize processes in a logical ring, and let a token be passed between them. The one
that holds the token is allowed to enter the critical region (if it wants to).
An overlay network constructed as a logical ring with a circulating token
115
Election algorithms
4 Coordination
An algorithm requires that some process acts as a coordinator. The question is how to
select this special process dynamically.
Note
• In many systems, the coordinator is chosen manually (e.g., file servers). This leads to
centralized solutions ⇒ single point of failure.
116
Election algorithms
4 Coordination
Assumptions
1. All processes have unique id’s
2. All processes know id’s of all processes in the system (but not if they are up or down)
3. Election means identifying the process with the highest id that is up
117
Election by bullying
Election algorithms
Each process has an associated priority (weight). The process with the highest priority
should always be elected as the coordinator.
118
Election by bullying
Election algorithms
119
Election in a ring
Election algorithms
Process priority is obtained by organizing processes into a (logical) ring. The process with
the highest priority should be elected as coordinator.
• Any process can start an election by sending an election message to its successor. If a
successor is down, the message is passed on to the next successor.
• If a message is passed on, the sender adds itself to the list. When it gets back to the
initiator, everyone had a chance to make its presence known.
• The initiator sends a coordinator message around the ring containing a list of all
living processes. The one with the highest priority is elected as coordinator.
120
Election in a ring
Election algorithms
121
Distributed Systems
5 Distributed programming
▶ Introduction
▶ Processes
▶ Communication
▶ Coordination
▶ Distributed programming
122
Socket programming for distributed systems
5 Distributed programming
Socket
A sockets is a virtual end point where entities can perform inter-process
communication. Sockets may communicate between processes on the same
machine, or between processes on different continents. E.g., client/server.
Socket API
A standard API for accessing network services provided by lower layers (4-3-2).
123
TCP Client-server interaction
Sockets
124
Socket operations
Sockets
125
Families of socket
Sockets
1 import socket
2
3 # create a TCP socket ( SOCK_STREAM )
4 s = socket . socket ( socket . AF_INET , socket . SOCK_STREAM )
5 print ( ’ Socket created ’)
Families of socket
• AF_INET – IPv4 Internet Protocols (32 bit addresses).
• AF_INET6 – IPv6 Internet Protocols (128 bit addresses).
• AF_UNIX – communication within the same machine
126
Types of Sockets
Sockets
1 import socket
2
3 # create a TCP socket ( SOCK_STREAM )
4 s = socket . socket ( socket . AF_INET , socket . SOCK_STREAM )
5 print ( ’ Socket created ’)
Types of Sockets
• SOCK_STREAM – Reliable, bidirectional flow. Example: TCP.
• SOCK_DGRAM – Unreliable, unidirectional data flow. Example: UDP.
• SOCK_RAW – Provides access to internal network protocol. Example: ICMP.
127
Assigning address (server)
Sockets
1 ...
2 port = 12345 # port number
3 # Bind socket to any address and port on the machine
4 s . bind (( ’ ’ , port ) )
5 print ( " socket binded to % s " %( port ) )
6 ...
128
Request listening (server)
Sockets
1 ...
2 s . listen (5) # Maximum queue size of 5
3 ...
129
Accepting request (server)
Sockets
accept() – receives connections to a bound socket, IP and port. Waits until a connection
is received.
1 ...
2 conn , addr = s . accept ()
3 ...
Return values
• The return value is a pair, (conn, addr).
— conn – a new socket object usable to send and receive data on the connection.
— addr – the address bound to the socket on the other end of the connection
130
Sending connection request (client)
Sockets
1 ...
2 host_ip = input ( " Enter target host : " ) # takes the server address
from input
3 port = 12345 # server port number
4 c . connect (( host_ip , port ) )
5 ...
131
Sending data (TCP)
Sockets
send(bytes) – used to send data from one socket to another socket. The method can
only be used with a connected socket (e.g., TCP).
Parameters
• Bytes – The data to be sent in bytes. In case the data is in string format, the encode()
method of str can be called to convert it into bytes.
1 ...
2 # Send data to server
3 data = " Hello Server ! "
4 s . send ( data . encode ( ’utf -8 ’) )
5 ...
132
Sending data (UDP)
Sockets
1 ...
2 # Send data to server
3 data = " Hello Server ! "
4 s . sendto ( data . encode ( ’utf -8 ’) ,( " 127.0.0.1 " ,12345) )
5 ...
133
Receiving data (TCP)
Sockets
Return value
• Returns the received data as bytes object.
134
Receiving data (UDP)
Sockets
Return value
• Returns a bytes object read from an UDP socket and the address of the client socket
as a tuple.
135
Closing a connection
Sockets
close() – closes the connection with the host. Once that happens, all future operations
on the socket object will fail. The remote end will receive no more data.
1 ...
2 # Close the connection with the client
3 conn . close ()
4 ...
136
Exercise: Echo protocol
Sockets
137
Echo client algorithm
Sockets
1. Read server IP and port number from standard input or use constants
2. Create a TCP socket
3. Connect the socket to the server
4. Repeat
— Read a line of text form the keyboard
— Send the line to the server
— Receive the response and print on the screen/console
Until ’stop’ or ’close’ line is read
5. Close the socket
138
Sequential Echo server algorithm
Sockets
139
Echo server on UDP
Sockets
140
Echo client on UDP
Sockets
1. Read the server IP address and port number from standard input or use constants
2. Create a UDP socket
3. Repeat
— Read a line of text form the keyboard
— Send the line to the server
— Receive the response and print on the screen/console
Until ’stop’ or ’close’ line is read
4. Close the socket
141
Exercise
Sockets
142
Concurrency
Sockets
143
File handling
Sockets
• Opening
— file_object = open(filename) – default is read
◦ E.g., myfile = open(“article.txt”,”r”)
— File access modes
◦ read only (’r’), write only (’w’), append (’a’), read and write (’r+’), etc.
• Reading
— myfile.read([n]) – read n bytes. If n is not specified it reads entire file.
— myfile.readline([n]) – reads a line of input. If n is specified reads at most n bytes.
— myfile.readlines() – reads all lines and returns them in a list.
• Writing
— myfile.write(data) – writes the string data on the file
— myfile.writelines(L) – writes all the string data from list L line-by-line.
• Closing
— myfile.close()
144
File transfer
Sockets
145
FTP server algorithm
Sockets
1. ...
2. Put the socket in passive mode
3. Repeat forever
— Accept a new connection
— Read the file_name from the connection
— Check if the file exists
◦ Get file size
◦ Send file size to the client
◦ Open the file for reading
◦ Repeat
– Send bytes
Until all bytes are transferred
◦ Close the file
— Close the connection
146
FTP client algorithm
Sockets
1. ...
2. Connect the socket to the server
3. Read the file_name form the keyboard
4. Send file_name to the server
5. Receive the file_size from the server
6. Open a new_file for writing
7. Repeat
◦ Receive bytes
Until all bytes are received
8. Close the new_file
9. Close the socket
147
Mini-project
Sockets
— The service must support more than two users, chatting at the same time.
— When any user joins the service, he/she should use a username, and the group must be
notified.
— All users must receive the messages (line-by-line) on their screen with the timestamp, the
username and the chat text.
— Users can join in the service using one of the two protocols (TCP and UDP), i.e., the service
should be listening on both TCP and UDP. Note: There can only be one group.
148
Mini-project
Sockets
2 Implement a distributed system that uses a master server to receive client requests and
two computing/storage servers.
149