DISTRIBUTED SYSTEMS PASCO - 16thapril - 2021
DISTRIBUTED SYSTEMS PASCO - 16thapril - 2021
(3marks)
Answer: A persistent object is an object that continues to exist even if it is currently not
contained in the address space of any server process. In other words, a persistent object is
not dependent on its current server. In practice, this means that the server that is currently
managing the persistent object, can store the object's state on secondary storage and then
exit. Later, a newly started server can read the object's state from storage into its own
address space, and handle invocation requests. In contrast, a transient object is an object
that exists only as long as the server that is hosting the object. As soon as that server exits,
the object ceases to exist as well. In a persistent object, even if there is break in session,
data will be stored somewhere for next use whereas transient object is an object that exists
only as long as its server exists, but possibly for a shorter period of time.
(iii) Describe the main mode of operations for static and dynamic methods of invocation.
(4marks)
Answer: Static invocations require that the interfaces of an object are known when the
client application is being developed. It also implies that if interfaces change, then the client
application must be recompiled before it can make use of the new interfaces. Static
invocation implies using object-based languages (e.g., Java) to predefine interface
definitions. It is used to invoke methods on objects for which no stubs are available in the
application (e.g., to implement a bridge that works for all object types).
(i) List the correct consensus algorithm and explain each of them
(ii) Enumerate and explain the economic and technical reasons for designing
distributed systems.
a. Cost: Better price/performance as long as commodity hardware is used for the
component computers
b. Performance: By using the combined processing and storage capacity of many
nodes, performance levels can be reached that are out of the scope of centralised
machines
c. Scalability: Resources such as processing and storage capacity can be increased
incrementally
d. Inherent distribution: Some applications like the Web are naturally distributed
e. Reliability: By having redundant components, the impact of hardware and
software faults on users can be reduced
(b)
(i) Differentiate between physical time, logical time and global time.
The physical time is tied to the notion of real time and can be used to order events
or find time difference between two events. They are used to adjust the time of
nodes. Each node in the system can share its local time with other nodes in the
system. The time is set based on UTC (Universal Time Coordination). UTC is used
as a reference time clock for the nodes in the system.
Logical time are derived from the notion of potential cause-effect between events
and not tied to notion of real time.
(ii) What is the criteria for the detection of a terminator of a distributed algorithm
needed to obtain simultaneous knowledge of all involved processes as well as take
account of messages that may traverse a network?
QUESTION TWO
(a) State four reasons why transparency can be dangerous in a distributed system.
Although distribution transparency is generally considered preferable for any
distributed system, there are situations in which attempting to blindly hide all
distribution aspects from users is not a good idea. A simple example is requesting your
electronic newspaper to appear in your mailbox before 7 A.M. local time, as usual,
while you are currently at the other end of the world living in a different time zone.
Your morning paper will not be the morning paper you are used to.
There is also a trade-off between a high degree of transparency and the performance of
a system. For example, many Internet applications repeatedly try to contact a server
before finally giving up. Consequently, attempting to mask a transient server failure
before trying another one may slow down the system as a whole. In such a case, it may
have been better to give up earlier, or at least let the user cancel the attempts to make
contact.
Several researchers have argued that hiding distribution will only lead to further
complicating the development of distributed systems, exactly for the reason that full
distribution transparency can never be achieved. A popular technique for achieving
access transparency is to extend procedure calls to remote servers. However, Waldo et
al. [64] already pointed out that attempting to hide distribution by means of such remote
procedure calls can lead to poorly understood semantics, for the simple reason that a
procedure call does change when executed over a faulty communication link.
As an alternative, various researchers and practitioners are now arguing for less
transparency, for example, by more explicitly using message-style communication, or
more explicitly posting requests to, and getting results from remote machines, as is
done in the Web when fetching pages.
A somewhat radical standpoint is taken by Wams [65] by stating that partial failures
preclude relying on the successful execution of a remote service. If such reliability
cannot be guaranteed, it is then best to always perform only local executions, leading
to the copy-before-use principle. According to this principle, data can be accessed only
after they have been transferred to the machine of the process wanting that data.
Moreover, modifying a data item should not be done. Instead, it can only be updated to
a new version. It is not difficult to imagine that many other problems will surface.
However, Wams [65] shows that many existing applications can be retrofitted to this
alternative approach without sacrificing functionality.
The conclusion is that aiming for distribution transparency may be a nice goal when
designing and implementing distributed systems, but that it should be considered
together with other issues such as performance and comprehensibility. The price for
achieving full transparency may be surprisingly high.
(b) Enumerate five differences between replication and caching and state the advantages
and disadvantages of each of them.
Cache: A cache is a temporary storage location for copied information. A Web cache
is a dedicated computer system which will monitor the object requests and stores
objects as it retrieves them from the server. On subsequent requests the cache will
deliver objects from its storage rather than passing the request to the origin server
Caching Systems:
1. Reduce network latency by bringing content closer to the content consumer.
2. Cache are essentially reactive wherein a data object is cached only when the
client requests it.
3. Meet traffic reduction goals by only getting content when requested.
4. Cache have consistency problems due to their reactive nature
5. Cache can have reliability problems as they are normally placed at network
entry points and a cache failure may sometimes bring the whole network down.
Replication systems:
1. Know exactly when a object changes and push the objects immediately.
2. Ensure content freshness due to their reactive nature.
3. Have very high fault tolerance due to replication of data, which ensures that
even if a web server goes down requests can be redirected to another origin
server.
4. Knowledge of the persistent domain allows load balancing.
5. Consume more disk space.
6. Need efficient algorithms for load balancing.
7. May increase network traffic if Multicast is not used judiciously.
Data Replication
Data Replication refers to the process of storing and maintaining numerous copies of
your crucial data across different machines. It helps organisations ensure high data
availability and accessibility at all times, thereby allowing organisations to access and
recover data even during an unforeseen disaster or data loss. Data replication is the
process of storing the same data in multiple locations to improve data availability and
accessibility, and to improve system resilience and reliability.
Replication helps avoid such additional disk-based I/O operations by eradicating the
data dependency on the master node only, thereby making the entire process more
durable.
2. Time Consuming.
Carrying out the tedious task of replication without any bugs, errors, etc., requires you
to set up a reaction pipeline. Setting up a reaction pipeline that operates correctly can
be a time-consuming task and can even take months, depending upon your replication
needs and the task complexities. Further, ensuring patience and keeping all the
stakeholders on the same page for this period can turn out to be a challenge even for
big organisations.
4. Technical Lags
One of the biggest challenges that an organization faces when replicating their data is
technical lags. Replication usually involves leveraging master nodes and slave nodes.
The master node acts as the data source and represents the point where the data flow
starts and reaches the slave nodes. These slave node usually face some lag associated
with the data coming from the master node. Such lags can occur depending upon the
system configurations and can range from a few records to hundreds of data records.
Since the slave nodes often suffer from some lag, they often face delays and do not
update the data in real-time. Lags are a common issue with most systems and
applications. However, they can be quite troublesome in cases as follows:
In case you’re shopping on an e-commerce website, and you add products to your
cart, but upon reaching the checkout stage, the “products” disappear. This happens
due to a lag in replication in the slave node.
In case you’re working with a transactional data flow, the transactions you might have
made are taking time to reflect at the destination. This happens due to a lag in
replication in the slave node.
QUESTION THREE
(a) In a high performance distributed system, differentiate between a system fault and a
system failure.
A system fault or error is a part of a system’s state that may lead to a failure. For
example, when transmitting packets across a network, it is to be expected that some
packets have been damaged when they arrive at the receiver. Damaged in this context
means that the receiver may incorrectly sense a bit value (e.g., reading a 1 instead of a
0), or may even be unable to detect that something has arrived.
A system failure is when the system cannot meet its demands. In particular, if a
distribute system is designed to provide its users with a number of services, the system
has failed when one or more of those services cannot be (completely) provided.
(b) Explain the following parameters and state how they can be prevented in a typical
distributed system.
(i) Process failure
Process failure: A process fails when it crashes — it is assumed that a crashed
process will make no further progress on its program. A crash is considered to
be clean if the process either functions correctly or has halted. A crash is termed
a fail-stop if other processes can detect with certainty that the process has
crashed. It can be prevented by rebooting the system as soon as possible and
configuring the failure point and wrong state.
(c) What are the main differences between a physical clock and logical clock?
The physical clock is tied to the notion of real time and can be used to order events or
find time difference between two events. They are used to adjust the time of nodes.
Each node in the system can share its local time with other nodes in the system. The
time is set based on UTC (Universal Time Coordination). UTC is used as a reference
time clock for the nodes in the system whereas Logical clock are derived from the
notion of potential cause-effect between events and not tied to notion of real time. It is
a numerical software counter value maintained in each process. Conceptually, this
logical clock can be thought of as a clock that only has meaning in relation to messages
moving between processes. When a process receives a message, it re-synchronizes its
logical clock with that sender.
QUESTION FOUR
(a) Servers hosting popular websites often receive more requests than they can handle,
causing their performance to suffer. A typical way of overcoming this problem is to
replicate the contents on other servers. Describe four (4) problems that may be
introduced and prescribe possible solutions to the problems you mention.
➜ Consistency (how to deal with updated data)
➜ Update propagation
Replica placement
➜ How many replicas?
➜ Where to put them?
Redirection/Routing
➜ Which replica should clients use?
(b) Explain how the performance of system may be improved by maintaining consistency.
(c) Enumerate the economic and technical reasons for designing a distributed system.
QUESTION TWO
(a) Give a brief description of the main layers of functionality of a vertical Distributive
architecture and horizontal distributive architecture and state the main function of each
of them.
Vertical Distributive Architecture
A client request is sent to the first server. During processing of the request this server
will request the services of the next server, who will do the same, until the final server
is reached. In this way the various servers become clients of each other (see Figure 3).
Each server is responsible for a different step (or tier) in the fulfilment of the original
client request.
Client server: This is used to initiate request
Application server: use to transmit client request to database server
Database server: used to process client request sent by the application server.
(c) Explain the role of an Edge-server network and state its functions.
Edge-server networks: In edge-server networks, as the name implies, servers are
placed at the “edge” of the Internet, for example at internet service providers (ISPs) or
close to enterprise networks. Client nodes (e.g., home users or an enterprise’s
employees) then access the nearby edge servers instead of the original server (which
may be located far away). This architecture is typically well suited for large-scale
content-distribution networks (e.g., Akamai)
Functions:
1. Mostly used for content and application distribution
2. Content distribution Networks
(20marks)
QUESTION THREE
(a) Communication in a distributed system involves the transfer of information in
synchronous and asynchronous modes. Explain the mode of transmission in each case.
QUESTION FOUR
(a) Servers hosting popular websites often receive more requests than they can handle,
causing their performance to suffer. A typical way of handling this problem is to
replicate the contents on other servers.
(i) Describe three (3) problems that may be introduced in the websites mentioned
in (a) and prescribe possible solutions to the problems you mention. (5marks)
(ii) Explain how the performance of a system may be improved by maintaining
consistency. (5marks)
(b)
(i) What are the functions of a client-centric consistency model? (3marks)
Client-centric Consistency Model defines how a data-store presents the data
value to an individual client when the client process accesses the data value
across different replicas.
Client-centric consistency provides guarantees for a single client
concerning the consistency of accesses to a data store by that client.
Assumption: Clients can access different replicas e.g. mobile users.
Eventual consistency for replicated data is fine if clients always accept
the same replica.
(ii) With the aid of a diagram, explain the parameters of a client-centric consistency
model. (3marks)
-
(iii) With the aid of a diagram, differentiate between monotonic reads and
monotonic writes and state the importance of each of them. (4marks)
In a monotonic-write consistent store, the following condition holds: A write
operation on data item x is completed before any successive write on x by the
same client. All writes by a single client are sequentially ordered. Thus
completing a write operation means that the copy on which a successive
operation is performed reflects the effect of a previous write operation by the
same process, no matter where that operation was initiated.
(b)
(i) With the aid of a simple design of a distributed architectural object model, explain
the main functions of a client within the model. (4marks)