0% found this document useful (0 votes)
73 views19 pages

DISTRIBUTED SYSTEMS PASCO - 16thapril - 2021

Persistent objects continue to exist even when not contained in server memory, while transient objects only exist as long as the hosting server. Static invocation requires predefined interfaces and recompilation for changes, while dynamic invocation allows runtime composition without predefined relationships. At the client, lookup retrieves stub references, while at the server, registry starts the service and rebind publishes objects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views19 pages

DISTRIBUTED SYSTEMS PASCO - 16thapril - 2021

Persistent objects continue to exist even when not contained in server memory, while transient objects only exist as long as the hosting server. Static invocation requires predefined interfaces and recompilation for changes, while dynamic invocation allows runtime composition without predefined relationships. At the client, lookup retrieves stub references, while at the server, registry starts the service and rebind publishes objects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

(ii) Differentiate between persistent and transient objects.

(3marks)
Answer: A persistent object is an object that continues to exist even if it is currently not
contained in the address space of any server process. In other words, a persistent object is
not dependent on its current server. In practice, this means that the server that is currently
managing the persistent object, can store the object's state on secondary storage and then
exit. Later, a newly started server can read the object's state from storage into its own
address space, and handle invocation requests. In contrast, a transient object is an object
that exists only as long as the server that is hosting the object. As soon as that server exits,
the object ceases to exist as well. In a persistent object, even if there is break in session,
data will be stored somewhere for next use whereas transient object is an object that exists
only as long as its server exists, but possibly for a shorter period of time.

(iii) Describe the main mode of operations for static and dynamic methods of invocation.
(4marks)
Answer: Static invocations require that the interfaces of an object are known when the
client application is being developed. It also implies that if interfaces change, then the client
application must be recompiled before it can make use of the new interfaces. Static
invocation implies using object-based languages (e.g., Java) to predefine interface
definitions. It is used to invoke methods on objects for which no stubs are available in the
application (e.g., to implement a bridge that works for all object types).

Dynamic invocation permits composing a method invocation at run-time. Dynamic


method invocation is used when the client object requests a service (by description) but
does not know the specific object ID or class of object to satisfy the request. This supports
runtime-configured applications where the relationships between components are not
determined at design time, that is, components are not tightly coupled. Dynamic invocation
generally takes a form such as invoke(object, method, inpuLparameters,
outpuLparameters); where object identifies the distributed object, method is a parameter
specifying exactly which method should be invoked, input-parameters is a data structure
that holds the values of that method's input parameters, and output-parameters refers to a
data structure where output values can be stored.
(iv)Within the methods of invocation, describe the main procedure at the client and the server
side of the system. (4marks)
Remote Method Invocation (RMI) is an API which allows an object to invoke a method on
an object that exists in another address space, which could be on the same machine or on a
remote machine.
Step 1: Defining the remote interface
The first thing to do is to create an interface which will provide the description of the methods
that can be invoked by remote clients.
Step 2: Implementing the remote interface
The next step is to implement the remote interface. To implement the remote interface, the
class should extend to UnicastRemoteObject class of java.rmi package. Also, a default
constructor needs to be created to throw the java.rmi.RemoteException from its parent
constructor in class.
Step 3: Creating Stub and Skeleton objects from the implementation class using RMIC
The RMIC tool is used to invoke the RMI compiler that creates the Stub and Skeleton objects.
Its prototype is RMIC classname. For above program the following command need to be
executed at the command prompt RMIC SearchQuery
STEP 4: Start the RMI registry
Start the registry service by issuing the following command at the command prompt start RMI
registry
STEP 5: Create and execute the server application program
The next step is to create the server application program and execute it on a separate command
prompt. The server program uses createRegistry method of LocateRegistry class to create RMI
registry within the server JVM with the port number passed as argument. The rebind method
of Naming class is used to bind the remote object to the new name.
Step 6: Create and execute the client application program
The last step is to create the client application program and execute it on a separate command
prompt .The lookup method of Naming class is used to get the reference of the Stub object.
CSM 557 – DISTRIBUTED SYSTEMS
DECEMBER, 2012
QUESTION ONE
(a) In a corporate banking institution, explain how you will design an appropriate
distributed system for the bank indicating the essential features of your system and how
the components can be made to work efficiently.

Multi-tiered client-server architectures are a direct consequence of dividing applications into a


user-interface, processing components, and a data level. The different tiers correspond directly
with the logical organization of applications. In many business environments, distributed
processing is equivalent to organizing a client-server application as a multitiered architecture.
We refer to this type of distribution as vertical distribution which are n-tier applications. The
characteristic feature of vertical distribution is that it is achieved by placing logically different
components on different machines. Again, from a system management perspective, having a
vertical distribution can help: Functions are logically and physically split across multiple
machines, where each machine is tailored to a specific group of functions. In modern
architectures (Horizontal Distribution) a client or server may be physically split up into
logically equivalent parts, but each part is operating on its own share of the complete data set,
thus balancing the load.

(i) List the correct consensus algorithm and explain each of them
(ii) Enumerate and explain the economic and technical reasons for designing
distributed systems.
a. Cost: Better price/performance as long as commodity hardware is used for the
component computers
b. Performance: By using the combined processing and storage capacity of many
nodes, performance levels can be reached that are out of the scope of centralised
machines
c. Scalability: Resources such as processing and storage capacity can be increased
incrementally
d. Inherent distribution: Some applications like the Web are naturally distributed
e. Reliability: By having redundant components, the impact of hardware and
software faults on users can be reduced

(iii) Briefly explain why unpredictable communication latencies affect the


synchronization of a distributed system.
Network latency occurs because packets have to traverse through several
communication hardware devices and be processed by many software components
while traveling from source to destination. A consequence of network latency is
the problem of out-of-date state information in which important information such
as the queue size at a given node, may change significantly by the time this
information is acted upon. Since the effectiveness of load-sharing algorithms is
predicated on the accuracy and timeliness of state information, it is important to
investigate the consequences of out-of-date state information on the performance
of load sharing.

(b)
(i) Differentiate between physical time, logical time and global time.
The physical time is tied to the notion of real time and can be used to order events
or find time difference between two events. They are used to adjust the time of
nodes. Each node in the system can share its local time with other nodes in the
system. The time is set based on UTC (Universal Time Coordination). UTC is used
as a reference time clock for the nodes in the system.

Logical time are derived from the notion of potential cause-effect between events
and not tied to notion of real time.
(ii) What is the criteria for the detection of a terminator of a distributed algorithm
needed to obtain simultaneous knowledge of all involved processes as well as take
account of messages that may traverse a network?

QUESTION TWO
(a) State four reasons why transparency can be dangerous in a distributed system.
Although distribution transparency is generally considered preferable for any
distributed system, there are situations in which attempting to blindly hide all
distribution aspects from users is not a good idea. A simple example is requesting your
electronic newspaper to appear in your mailbox before 7 A.M. local time, as usual,
while you are currently at the other end of the world living in a different time zone.
Your morning paper will not be the morning paper you are used to.

Likewise, a wide-area distributed system that connects a process in San Francisco to a


process in Amsterdam cannot be expected to hide the fact that Mother Nature will not
allow it to send a message from one process to the other in less than approximately 35
ms. Practice shows that it actually takes several 100 ms using a computer network.
Signal transmission is not only limited by the speed of light, but also by limited
processing capacities and delays in the intermediate switches.

There is also a trade-off between a high degree of transparency and the performance of
a system. For example, many Internet applications repeatedly try to contact a server
before finally giving up. Consequently, attempting to mask a transient server failure
before trying another one may slow down the system as a whole. In such a case, it may
have been better to give up earlier, or at least let the user cancel the attempts to make
contact.

Another example is where we need to guarantee that several replicas, located on


different continents, must be consistent all the time. In other words, if one copy is
changed, that change should be propagated to all copies before allowing any other
operation. It is clear that a single update operation may now even take seconds to
complete, something that cannot be hidden from users.
Finally, there are situations in which it is not at all obvious that hiding distribution is a
good idea. As distributed systems are expanding to devices that people carry around
and where the very notion of location and context awareness is becoming increasingly
important, it may be best to actually expose distribution rather than trying to hide it. An
obvious example is making use of location-based services, which can often be found
on mobile phones, such as finding the nearest Chinese take-away or checking whether
any of your friends are nearby.

Several researchers have argued that hiding distribution will only lead to further
complicating the development of distributed systems, exactly for the reason that full
distribution transparency can never be achieved. A popular technique for achieving
access transparency is to extend procedure calls to remote servers. However, Waldo et
al. [64] already pointed out that attempting to hide distribution by means of such remote
procedure calls can lead to poorly understood semantics, for the simple reason that a
procedure call does change when executed over a faulty communication link.

As an alternative, various researchers and practitioners are now arguing for less
transparency, for example, by more explicitly using message-style communication, or
more explicitly posting requests to, and getting results from remote machines, as is
done in the Web when fetching pages.

A somewhat radical standpoint is taken by Wams [65] by stating that partial failures
preclude relying on the successful execution of a remote service. If such reliability
cannot be guaranteed, it is then best to always perform only local executions, leading
to the copy-before-use principle. According to this principle, data can be accessed only
after they have been transferred to the machine of the process wanting that data.
Moreover, modifying a data item should not be done. Instead, it can only be updated to
a new version. It is not difficult to imagine that many other problems will surface.
However, Wams [65] shows that many existing applications can be retrofitted to this
alternative approach without sacrificing functionality.

The conclusion is that aiming for distribution transparency may be a nice goal when
designing and implementing distributed systems, but that it should be considered
together with other issues such as performance and comprehensibility. The price for
achieving full transparency may be surprisingly high.

(b) Enumerate five differences between replication and caching and state the advantages
and disadvantages of each of them.

Cache: A cache is a temporary storage location for copied information. A Web cache
is a dedicated computer system which will monitor the object requests and stores
objects as it retrieves them from the server. On subsequent requests the cache will
deliver objects from its storage rather than passing the request to the origin server
Caching Systems:
1. Reduce network latency by bringing content closer to the content consumer.
2. Cache are essentially reactive wherein a data object is cached only when the
client requests it.
3. Meet traffic reduction goals by only getting content when requested.
4. Cache have consistency problems due to their reactive nature
5. Cache can have reliability problems as they are normally placed at network
entry points and a cache failure may sometimes bring the whole network down.
Replication systems:
1. Know exactly when a object changes and push the objects immediately.
2. Ensure content freshness due to their reactive nature.
3. Have very high fault tolerance due to replication of data, which ensures that
even if a web server goes down requests can be redirected to another origin
server.
4. Knowledge of the persistent domain allows load balancing.
5. Consume more disk space.
6. Need efficient algorithms for load balancing.
7. May increase network traffic if Multicast is not used judiciously.
Data Replication
Data Replication refers to the process of storing and maintaining numerous copies of
your crucial data across different machines. It helps organisations ensure high data
availability and accessibility at all times, thereby allowing organisations to access and
recover data even during an unforeseen disaster or data loss. Data replication is the
process of storing the same data in multiple locations to improve data availability and
accessibility, and to improve system resilience and reliability.

Advantages of Data Replication


B. Better Application Reliability
Replicating your data across various machines helps ensure that you can access the
data with ease, even when a hardware or machinery failure occurs, thereby boosting
the reliability of your system.

2. Better Transactional Commit Performance


When you’re working with transactional data, you need to monitor various
synchronous processes to ensure that the data updation takes place everywhere at the
same time. Hence, your application must write the commit before the control threads
can continue the tasks.

Replication helps avoid such additional disk-based I/O operations by eradicating the
data dependency on the master node only, thereby making the entire process more
durable.

3. Better Read Performance


Read Performance.
With replication in place, users can route data reads across numerous machines that
are a part of the network, thereby improving upon the read performance of your
application. Hence, readers working on remote networks can fetch and read data with
ease.
This application of Data Replication also helps reduce the cache missings & lower the
input/output operations on the replica as replicas may also need to cache that part of
the data.

4. Data Durability Guarantee


Replication helps boost and ensure robust data durability, as it results in data
changes/updation taking place on multiple machines simultaneously, instead of a
single computer. It thereby provides more processing & computation power, by
leveraging numerous CPUs and disks to ensure that the replication, transformation
and loading processes take place correctly.

5. Robust Data Recovery


Data Recovery.
Organizations depend on a diverse set of software and hardware to help them carry
out their daily operations and, hence fear any unforeseen data breaches or losses. Data
recovery is thus, one of the biggest challenges and fears that all organisations face.
Replication allows users to maintain backups of their data that update in real-time,
thereby allowing them to access current and up-to-date data, even during any failures/
data losses.

Disadvantage of Replicating Data


1. High Cost.
Replicating data requires you to invest in numerous hardware and software
components such as CPUs, storage disks, etc., along with a complete technical set up
to ensure a smooth replication process. It further requires you to invest in acquiring
more “manpower” with a strong technical background. All such requirements make
the process of replicating data, challenging, even for big organisations.

2. Time Consuming.
Carrying out the tedious task of replication without any bugs, errors, etc., requires you
to set up a reaction pipeline. Setting up a reaction pipeline that operates correctly can
be a time-consuming task and can even take months, depending upon your replication
needs and the task complexities. Further, ensuring patience and keeping all the
stakeholders on the same page for this period can turn out to be a challenge even for
big organisations.

3. High Bandwidth Requirement.


With replication taking place, a large amount of data flows from your data source to
the destination database. To ensure a smooth flow of information and prevent any loss
of data, having sufficient bandwidth is necessary. Maintaining bandwidth, capable of
supporting & processing large volumes of complex data while carrying out the
replication process can be a challenging task, even for large organisations.

4. Technical Lags
One of the biggest challenges that an organization faces when replicating their data is
technical lags. Replication usually involves leveraging master nodes and slave nodes.
The master node acts as the data source and represents the point where the data flow
starts and reaches the slave nodes. These slave node usually face some lag associated
with the data coming from the master node. Such lags can occur depending upon the
system configurations and can range from a few records to hundreds of data records.

Since the slave nodes often suffer from some lag, they often face delays and do not
update the data in real-time. Lags are a common issue with most systems and
applications. However, they can be quite troublesome in cases as follows:

In case you’re shopping on an e-commerce website, and you add products to your
cart, but upon reaching the checkout stage, the “products” disappear. This happens
due to a lag in replication in the slave node.
In case you’re working with a transactional data flow, the transactions you might have
made are taking time to reflect at the destination. This happens due to a lag in
replication in the slave node.

QUESTION THREE
(a) In a high performance distributed system, differentiate between a system fault and a
system failure.
A system fault or error is a part of a system’s state that may lead to a failure. For
example, when transmitting packets across a network, it is to be expected that some
packets have been damaged when they arrive at the receiver. Damaged in this context
means that the receiver may incorrectly sense a bit value (e.g., reading a 1 instead of a
0), or may even be unable to detect that something has arrived.

A system failure is when the system cannot meet its demands. In particular, if a
distribute system is designed to provide its users with a number of services, the system
has failed when one or more of those services cannot be (completely) provided.

(b) Explain the following parameters and state how they can be prevented in a typical
distributed system.
(i) Process failure
Process failure: A process fails when it crashes — it is assumed that a crashed
process will make no further progress on its program. A crash is considered to
be clean if the process either functions correctly or has halted. A crash is termed
a fail-stop if other processes can detect with certainty that the process has
crashed. It can be prevented by rebooting the system as soon as possible and
configuring the failure point and wrong state.

(ii) Failure masking


Failure masking: involves hiding the occurrence of failures from other
processes. Having a group of identical processes allows us to mask one or more
faulty processes in that group. In other words, we can replicate processes and
organize them into a group to replace a single (vulnerable) process with a (fault
tolerant) group.

(iii) Storage failure


Storage failure: Storage failure occurs when the kept information cannot be
accessed. This failure is sometimes caused by parity error, head crash, or dirt
particles settled on the medium. This can be prevented by reconstructing content
from the archive and the log of activities and style reflected disk system.
(iv) Communication failure
Communication failure: Communication failure happens once a web site
cannot communicate with another operational site within the network. It is
typically caused by the failure of the shift nodes and/or the links of the human
activity system. To prevent this, use reroute and error-resistant communication
protocols.

(c) What are the main differences between a physical clock and logical clock?
The physical clock is tied to the notion of real time and can be used to order events or
find time difference between two events. They are used to adjust the time of nodes.
Each node in the system can share its local time with other nodes in the system. The
time is set based on UTC (Universal Time Coordination). UTC is used as a reference
time clock for the nodes in the system whereas Logical clock are derived from the
notion of potential cause-effect between events and not tied to notion of real time. It is
a numerical software counter value maintained in each process. Conceptually, this
logical clock can be thought of as a clock that only has meaning in relation to messages
moving between processes. When a process receives a message, it re-synchronizes its
logical clock with that sender.

QUESTION FOUR
(a) Servers hosting popular websites often receive more requests than they can handle,
causing their performance to suffer. A typical way of overcoming this problem is to
replicate the contents on other servers. Describe four (4) problems that may be
introduced and prescribe possible solutions to the problems you mention.
➜ Consistency (how to deal with updated data)

➜ Update propagation
Replica placement
➜ How many replicas?
➜ Where to put them?
Redirection/Routing
➜ Which replica should clients use?
(b) Explain how the performance of system may be improved by maintaining consistency.
(c) Enumerate the economic and technical reasons for designing a distributed system.

KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY,


KUMASI
COLLEGE OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
END OF FIRST SEMESTER EXAMINATION
CSM 557 – DISTRIBUTED SYSTEMS
MPHIL. COMPUTER SCIENCE

DECEMBER, 2013 TOTAL TIME ALLOWED: 3 HOURS

ANSWER THREE QUESTIONS ONLY


QUESTION ONE
(a) In a typical processing factory, list the essential features required for the design of an
appropriate distributed system for effective performance.
(b) In a typical client-server architecture, give a simple design and state the roles of each
of the components.
(c) From your answer in (b) above,
Write a simple client-server code in C, C++ or Java to control the communication
between the client and the server. (20marks)

QUESTION TWO
(a) Give a brief description of the main layers of functionality of a vertical Distributive
architecture and horizontal distributive architecture and state the main function of each
of them.
Vertical Distributive Architecture
A client request is sent to the first server. During processing of the request this server
will request the services of the next server, who will do the same, until the final server
is reached. In this way the various servers become clients of each other (see Figure 3).
Each server is responsible for a different step (or tier) in the fulfilment of the original
client request.
 Client server: This is used to initiate request
 Application server: use to transmit client request to database server
 Database server: used to process client request sent by the application server.

Horizontal Distributive Architecture


Horizontal distribution involves replicating a server’s functionality or contents over
multiple computers. In this case, each machine contains a copy of the server’s
information. Request from clients are passed to the various servers on the network.
(b) With the aid of a suitable diagram, state and explain the main functions of the various
parts of a collaborative distributed system.
Collaborative distributed systems: In collaborative distributed systems, peers
typically support each other to deliver content in a peer to peer like architecture, while
they use a client server architecture for the initial setup of the network. In BitTorrent
for example, nodes requesting to download a file from a server first contact the server
to get the location of a tracker. The tracker then tells the nodes the locations of other
nodes, from which chunks of the content can be downloaded concurrently. Nodes must
then offer downloaded chunks to other nodes and are registered with the tracker, so that
the other nodes can find them.
 Node downwards chunks of file from many other nodes
 Node provides downloaded chunks to other nodes
 Tracker keeps track of active nodes that have chunks of file
 Enforce collaboration by penalizing selfish nodes

(c) Explain the role of an Edge-server network and state its functions.
Edge-server networks: In edge-server networks, as the name implies, servers are
placed at the “edge” of the Internet, for example at internet service providers (ISPs) or
close to enterprise networks. Client nodes (e.g., home users or an enterprise’s
employees) then access the nearby edge servers instead of the original server (which
may be located far away). This architecture is typically well suited for large-scale
content-distribution networks (e.g., Akamai)
Functions:
1. Mostly used for content and application distribution
2. Content distribution Networks
(20marks)

QUESTION THREE
(a) Communication in a distributed system involves the transfer of information in
synchronous and asynchronous modes. Explain the mode of transmission in each case.

QUESTION FOUR
(a) Servers hosting popular websites often receive more requests than they can handle,
causing their performance to suffer. A typical way of handling this problem is to
replicate the contents on other servers.
(i) Describe three (3) problems that may be introduced in the websites mentioned
in (a) and prescribe possible solutions to the problems you mention. (5marks)
(ii) Explain how the performance of a system may be improved by maintaining
consistency. (5marks)
(b)
(i) What are the functions of a client-centric consistency model? (3marks)
Client-centric Consistency Model defines how a data-store presents the data
value to an individual client when the client process accesses the data value
across different replicas.
 Client-centric consistency provides guarantees for a single client
concerning the consistency of accesses to a data store by that client.
 Assumption: Clients can access different replicas e.g. mobile users.
 Eventual consistency for replicated data is fine if clients always accept
the same replica.
(ii) With the aid of a diagram, explain the parameters of a client-centric consistency
model. (3marks)

-
(iii) With the aid of a diagram, differentiate between monotonic reads and
monotonic writes and state the importance of each of them. (4marks)
In a monotonic-write consistent store, the following condition holds: A write
operation on data item x is completed before any successive write on x by the
same client. All writes by a single client are sequentially ordered. Thus
completing a write operation means that the copy on which a successive
operation is performed reflects the effect of a previous write operation by the
same process, no matter where that operation was initiated.

In a monotonic read, if a process reads the value of a data item x, any


successive read operation on x by that process will always return that
same value or a more recent value. In other words, monotonic-read
consistency guarantees that if a process has seen a value of x at time t,
it will never see an older version of x at a later time.
QUESTION FIVE
(a) Give a brief description of a middleware and state how it can be used to improve the
general performance of a system. (5marks)
The term middleware applies to a software layer that provides a programming
abstraction as well as masking the heterogeneity of the underlying networks, hardware,
operating systems and programming languages. It is a separate layer of software that is
logically placed on top of the respective operating systems of the computers that are
part of the system.
 Communication: allows an application to invoke a function that is implemented
and executed on a remote computer as if it was locally available via remote
procedure call (RPC).
 Transactions: Many applications make use of multiple services that are
distributed among several computers. Middleware generally offers special
support for executing such services in an all-or-nothing fashion, commonly
referred to as an atomic transaction.
 Service composition: Web-based middleware can help by standardizing the way
Web services are accessed and providing the means to generate their functions
in a specific order.
 Reliability: Any message sent by one process is guaranteed to be received by
all or no other process.

(b)
(i) With the aid of a simple design of a distributed architectural object model, explain
the main functions of a client within the model. (4marks)

1. Optimization cannot be done by programmer or user


2. Strange behaviour when the underlying system fails
3. Underlying system can be very complex
4. Users will circumvent the security in preference of productivity

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy