UNIT-1 Distributed System
UNIT-1 Distributed System
models – types of networks – network principles – internet protocols – the API for internet
protocols – external data representation and marshalling – client-server communication
– group communication
All data and computational resources are kept and controlled in a single central place, such as
a server, in a centralized system. Applications and users connect to this hub in order to access
and handle data. Although this configuration is easy to maintain and secure, if too many users
access it simultaneously or if the central server malfunctions, it could become a bottleneck.
A distributed system, on the other hand, disperses data and resources over several servers or
locations, frequently across various physical places. Better scalability and reliability are made
possible by this configuration since the system can function even in the event of a component
failure. However, because of their numerous points of interaction, distributed systems can be
more difficult to secure and administer.
• Client-Server Architecture:
o In this setup, servers provide resources or services, and clients request them.
Clients and servers communicate over a network.
o Examples: Web applications, where browsers (clients) request pages from web
servers.
• Peer-to-Peer (P2P) Architecture:
o Each node, or “peer,” in the network acts as both a client and a server, sharing
resources directly with each other.
• Three-Tier Architecture:
o This model has three layers: presentation (user interface), application (business
logic), and data (database). Each layer is separated to allow easier scaling and
maintenance.
o Examples: Many web applications use this to separate user interfaces, logic
processing, and data storage.
• Microservices Architecture:
o The application is split into small, independent services, each handling specific
functions. These services communicate over a network, often using REST APIs
or messaging.
• Event-Driven Architecture:
o Components interact by sending and responding to events rather than direct
requests. An event triggers specific actions or processes in various parts of the
system.
The most common forms of distributed systems today operate over the internet, handing off
workloads to dozens of cloud-based virtual server instances that are created as needed, and
then terminated when the task is complete.
Example of a Distributed System
Any Social Media can have its Centralized Computer Network as its Headquarters and
computer systems that can be accessed by any user and using their services will be the
Autonomous Systems in the Distributed System Architecture.
• Database: It is used to store the processed data that are processed by each Node/System
of the Distributed systems that are connected to the Centralized network.
• As we can see that each Autonomous System has a common Application that can have
its own data that is shared by the Centralized Database System.
• Middleware Services enable some services which are not present in the local systems
or centralized system default by acting as an interface between the Centralized System
and the local systems. By using components of Middleware Services systems
communicate and manage data.
• The Data which is been transferred through the database will be divided into segments
or modules and shared with Autonomous systems for processing.
• The Data will be processed and then will be transferred to the Centralized system
through the network and will be stored in the database.
Characteristics of Distributed System
• Resource Sharing: It is the ability to use any Hardware, Software, or Data anywhere
in the System.
• Openness: It is concerned with Extensions and improvements in the system (i.e., How
openly the software is developed and shared with others)
• Concurrency: It is naturally present in Distributed Systems, that deal with the same
activity or functionality that can be performed by separate users who are in remote
locations. Every local system has its independent Operating Systems and Resources.
• Fault tolerance: It cares about the reliability of the system if there is a failure in
Hardware or Software, the system continues to operate properly without degrading the
performance the system.
• Transparency: It hides the complexity of the Distributed Systems to the Users and
Application programs as there should be privacy in every system.
• Scalability: Distributed systems can easily grow by adding more computers (nodes),
allowing them to handle increased demand without significant reconfiguration.
• Reliability and Fault Tolerance: If one part of the system fails, others can take over,
making distributed systems more resilient and ensuring services remain available.
• Performance: Workloads can be split across multiple nodes, allowing tasks to be
completed faster and improving overall system performance.
• Resource Sharing: Distributed systems allow resources like data, storage, and
computing power to be shared across nodes, increasing efficiency and reducing costs.
• Security possess a problem due to easy access to data as the resources are shared to
multiple systems.
• Networking Saturation may cause a hurdle in data transfer i.e., if there is a lag in the
network then the user will face a problem accessing data.
• In comparison to a single user system, the database associated with distributed systems
is much more complex and challenging to manage.
• If every node in a distributed system tries to send data at once, the network may become
overloaded.
Distributed systems and microservices are related concepts but not the same. Let’s break down
the differences:
1. Distributed Systems:
2. Microservices:
While microservices can be implemented in a distributed system, they are not same.
Microservices focus on architectural design principles, emphasizing modularity, scalability,
and flexibility, whereas distributed systems encompass a broader range of concepts, including
communication protocols, fault tolerance, and concurrency control, among others.
I. Physical Model
1. Nodes
Nodes are the end devices that can process data, execute tasks, and communicate with the other
nodes. These end devices are generally the computers at the user end or can be servers,
workstations, etc.
• Nodes provision the distributed system with an interface in the presentation layer that
enables the user to interact with other back-end devices, or nodes, that can be used for
storage and database services, processing, web browsing, etc.
• Each node has an Operating System, execution environment, and different middleware
requirements that facilitate communication and other vital tasks.,
2. Links
Links are the communication channels between different nodes and intermediate devices.
These may be wired or wireless. Wired links or physical media are implemented using copper
wires, fiber optic cables, etc. The choice of the medium depends on the environmental
conditions and the requirements. Generally, physical links are required for high-performance
and real-time computing. Different connection types that can be implemented are as follows:
• Point-to-point links: Establish a connection and allow data transfer between only two
nodes.
• Multi-Access links: Multiple nodes share the same communication channel to transfer
data. Requires protocols to avoid interference while transmission.
3. Middleware
These are the softwares installed and executed on the nodes. By running middleware on each
node, the distributed computing system achieves a decentralised control and decision-making.
It handles various tasks like communication with other nodes, resource management, fault
tolerance, synchronisation of different nodes and security to prevent malicious and
unauthorised access.
4. Network Topology
This defines the arrangement of nodes and links in the distributed computing system. The most
common network topologies that are implemented are bus, star, mesh, ring or hybrid. Choice
of topology is done by determining the exact use cases and the requirements.
5. Communication Protocols
Communication protocols are the set rules and procedures for transmitting data from in the
links. Examples of these protocols include TCP, UDP, HTTPS, MQTT etc. These allow the
nodes to communicate and interpret the data.
II. Architectural Model
Architectural model in distributed computing system is the overall design and structure of the
system, and how its different components are organised to interact with each other and provide
the desired functionalities. It is an overview of the system, on how will the development,
deployment and operations take place. Construction of a good architectural model is required
for efficient cost usage, and highly improved scalability of the applications.
1. Client-Server model
It is a centralised approach in which the clients initiate requests for services and severs respond
by providing those services. It mainly works on the request-response model where the client
sends a request to the server and the server processes it, and responds to the client accordingly.
• This is mainly used in web services, cloud computing, database management systems
etc.
2. Peer-to-peer model
It is a decentralised approach in which all the distributed computing nodes, known as peers,
are all the same in terms of computing capabilities and can both request as well as provide
services to other peers. It is a highly scalable model because the peers can join and leave the
system dynamically, which makes it an ad-hoc form of network.
• The resources are distributed and the peers need to look out for the required resources
as and when required.
• The communication is directly done amongst the peers without any intermediaries
according to some set rules and procedures defined in the P2P networks.
It involves organising the system into multiple layers, where each layer will provision a specific
service. Each layer communicated with the adjacent layers using certain well-defined protocols
without affecting the integrity of the system. A hierarchical structure is obtained where each
layer abstracts the underlying complexity of lower layers.
4. Micro-services model
In this system, a complex application or task, is decomposed into multiple independent tasks
and these services running on different servers. Each service performs only a single function
and is focussed on a specific business-capability. This makes the overall system more
maintainable, scalable and easier to understand. Services can be independently developed,
deployed and scaled without affecting the ongoing services.
III. Fundamental Model
1. Interaction Model
Distributed computing systems are full of many processes interacting with each other in highly
complex ways. Interaction model provides a framework to understand the mechanisms and
patterns that are used for communication and coordination among various processes. Different
components that are important in this model are –
• Message Passing – It deals with passing messages that may contain, data, instructions,
a service request, or process synchronisation between different computing nodes. It may
be synchronous or asynchronous depending on the types of tasks and processes.
• Publish/Subscribe Systems – Also known as pub/sub system. In this the publishing
process can publish a message over a topic and the processes that are subscribed to that
topic can take it up and execute the process for themselves. It is more important in an
event-driven architecture.
2. Remote Procedure Call (RPC)
1. Failure Model
This model addresses the faults and failures that occur in the distributed computing system. It
provides a framework to identify and rectify the faults that occur or may occur in the system.
Fault tolerance mechanisms are implemented so as to handle failures by replication and error
detection and recovery methods. Different failures that may occur are:
• Timing failures – The process deviates from its expected time quantum and may lead
to delays or unsynchronised response times.
• Byzantine failures – The process may send malicious or unexpected messages that
conflict with the set protocols.
2. Security Model
Distributed computing systems may suffer malicious attacks, unauthorised access and data
breaches. Security model provides a framework for understanding the security requirements,
threats, vulnerabilities, and mechanisms to safeguard the system and its resources. Various
aspects that are vital in the security model are:
• Authentication: It verifies the identity of the users accessing the system. It ensures that
only the authorised and trusted entities get access. It involves –
• Encryption:
Distributed systems are networks of interconnected computers that work together to solve
complex problems or perform tasks, using resources and communication protocols to achieve
efficiency, scalability, and fault tolerance. From understanding the fundamentals of distributed
computing to navigating the challenges of scalability, fault tolerance, and consistency, this
article provides a concise overview of key principles essential for building resilient and
efficient distributed systems.
Important Topics for Distributed System Principles
To make good distributed systems, you need to follow some important rules:
1. Decentralization
• Each node in a decentralized system works on its own but also works together with
others to get things done. So, if one node stops working, it does not affect the whole
system much because the others can still work independently.
2. Scalability
Scalability means how well a distributed system can handle more work and needs for resources.
If more people start using a service or if there's more data to process, a scalable system can
handle it without slowing down much.
• There are two types: horizontal and vertical. Horizontal scalability means adding more
computers to the system, while vertical scalability means making each computer more
powerful.
• Techniques like spreading the work evenly, dividing it into parts, and sharing the load
help make sure the system runs smoothly even as it gets bigger.
3. Fault Tolerance
Fault tolerance is about how well a distributed system can handle things going wrong. It means
the system can find out when something's not working right, fix it, and keep running smoothly.
• Since problems are bound to happen in complex systems, fault tolerance is crucial for
making sure the system stays reliable and available.
• Techniques like copying data or tasks onto different computers, keeping extra resources
just in case, and having plans to detect and recover from errors help reduce the impact
of failures.
• Also, there are strategies for automatically switching to backups when needed and for
making sure the system can still work even if it's not at full capacity.
4. Consistency
Consistency means making sure all parts of a distributed system have the same information
and act the same way, even if lots of things are happening at once. If things are not consistent,
it can mess up the data, break rules, and cause mistakes.
• Distributed systems keep things consistent by using methods like doing multiple tasks
together so they all finish or using locks to stop different parts from changing shared
things at the same time.
• There are different levels of consistency, like strong consistency where everything is
always the same, eventual consistency where it might take time but will get there, and
causal consistency which is somewhere in between. These levels depend on how
important it is for the system to work fast, be available, and handle problems.
5. Performance Optimization
Performance optimization means making a distributed system work faster and better by
improving how data is stored, how computers talk to each other, and how tasks are done.
• For example, using smart ways to store data across many computers and quickly find
what's needed.
• Also, using efficient ways for computers to communicate, like sending messages in a
smart order to reduce delays. And, using clever ways to split up tasks between
computers and work on them at the same time, which speeds things up.
What is Distributed Coordination?
Distributed coordination is important for making sure all the parts of a distributed system work
together smoothly to achieve same goals. In a distributed setup, lots of independent computers
are working, coordination is crucial for making sure everyone is on the same page, managing
resources fairly, and keeping everything running smoothly. Let's break down the main parts of
distributed coordination:
• Raft: This algorithm makes it simpler for computers to agree by breaking it down into
smaller steps.
• Semaphore Locks: They let a few computers use something together but not too many.
These help computers talk to each other so they can share information and coordinate what
they're doing. They make sure messages get where they need to go and that everything keeps
working even if there are problems.
• MQTT: It's good for sending messages in situations where there might be slow or weak
connections, like in Internet of Things devices.
• AMQP: This protocol is strong and reliable, perfect for big business systems where
messages need to get through no matter what.
Fault Tolerance in Distributed Systems
Fault tolerance is super important in designing distributed systems because it helps keep the
system running even when things go wrong, like if a computer breaks or the network has
problems. Here are some main ways to handle faults in distributed systems:
• Redundancy: Keeping extra copies of important stuff like hardware, software, or data
so if something breaks, there's a backup ready to take over. This helps avoid downtime
and keeps the system running smoothly.
• Error Detection and Recovery: Having tools in place to spot when something goes
wrong and fix it before it causes big problems. This might involve checking if
everything's okay, diagnosing issues, and taking steps to get things back on track.
Managing data in distributed systems is very important. It means handling data across many
computers while making sure it's consistent, reliable, and can handle a lot of work. In these
systems, data is spread across different computers to make things faster, safer, and able to
handle more work. Now, let's look at the main ways we do this and the technologies we use.
• Sharding: Splitting a big dataset into smaller parts and spreading them across different
computers. Each computer handles its own part, which helps speed things up and avoids
overloading any single computer.
• Replication: Making copies of data and storing them on different computers. This
ensures that even if one computer fails, there are backups available. It also helps data
get to where it's needed faster.
• Consistency Models: These are rules that decide how data changes are seen across
different computers.
• Distributed Databases: These are databases spread across many computers. They use
techniques like sharding and replication to make sure data is available, consistent, and
safe. Examples: Cassandra, MongoDB.
• Distributed File Systems: These are like big digital storage spaces spread across many
computers. They break data into chunks and spread them out for faster access and
backup. Examples: HDFS, Amazon S3.
Security is important in distributed systems because they are complicated and spread out across
many computers. We need to keep sensitive data safe, make sure our messages are not tampered
with, and protect against hackers. Here are the main ways we do this:
• Encryption: This means making data unreadable to anyone who shouldn't see it. We
do this when data is moving between computers or when it's stored somewhere. It keeps
sensitive information safe even if someone tries to snoop.
• Authentication: This is about making sure that the people, devices, or services trying
to access the system are who they say they are. We use things like passwords, fingerprint
scans, or special codes to check their identity.
• Access Control: This is like having locked doors that only certain people can open. We
decide who can see or change things in the system and make sure nobody else can get
in where they shouldn't.
• Audit Logging: This means keeping a record of everything that happens in the system
so we can check if something bad has happened or if someone tried to break in. It's like
having security cameras everywhere.
• DDoS Mitigation: Sometimes bad actors try to overwhelm the system with too much
traffic to shut it down. We use special tools to filter out this bad traffic and keep the
system running smoothly.
1. Google's Infrastructure
Google's setup is a big example of how distributed systems can work on a large scale. They use
stuff like Google File System (GFS), Bigtable, and MapReduce to manage huge amounts of
data. This helps them offer services like search, cloud computing, and real-time analytics
without any hiccups.
o GFS is a special way of organizing and handling big amounts of data across
many computers. It's made to work even if some of those computers stop
working.
o GFS copies the data in different places to keep it safe, and it makes sure we can
still get to the data even if something goes wrong with one of the computers.
• Bigtable:
o Bigtable is a special kind of storage system that can hold huge amounts of
organized data across many computers. It's great for storing lots of information
and quickly finding what you need.
o Bigtable is used in things like Google Search, Gmail, and Google Maps because
it's so good at handling massive amounts of data efficiently.
• MapReduce:
o MapReduce is a way of programming and handling big amounts of data spread
across many computers. It's like having lots of people working on different parts
of a big project at the same time.
o This helps to get things done faster and handle really huge amounts of data. It's
great for jobs like analyzing data or doing tasks in big batches.
2. Twitter
Twitter uses a bunch of fancy computer systems to handle all the people who use it and the
messages they send in real-time. They use things like Apache Mesos and Apache Aurora to
make sure everything works smoothly even when there are millions of tweets happening every
day. It's like having a really strong foundation to support a huge building - it keeps everything
running smoothly and reliably.
• Microservices Architecture:
o Twitter's setup is a puzzle where each piece does its own job. They've divided
their system into smaller parts, called microservices, and each one takes care of
a different thing, like sending tweets or handling notifications.
o By doing this, Twitter can adjust things easily when lots of people are using it,
making sure it runs smoothly no matter what.
• Apache Mesos:
o Boss for a bunch of computers, helping them share and use their power better.
It handles things like how much memory or space each computer has and makes
sure everything runs smoothly.
o For Twitter, Mesos is super helpful because it helps them run lots of little
programs more efficiently, saving time and making things easier to manage.
• Apache Aurora:
o Smart manager for computer systems. It helps organize and run different tasks
and services on a bunch of machines.
o It's designed to make sure everything runs smoothly, even if something goes
wrong with one of the machines.
o With Aurora, Twitter can easily set up and manage its services, making sure
they're always available and working well.
This programmer's reference describes an interface to the transport layer of the Basic Reference
Model of Open Systems Interconnection (OSI). Although the API is capable of interfacing to
proprietary protocols, the Internet open network protocols are the intended providers of the
transport service. This document uses the term "open" to emphasize that any system
conforming to one of these standards can communicate with any other system conforming to
the same standard, regardless of vendor. These protocols are contrasted with proprietary
protocols that generally support a closed community of systems supplied by a single vendor
External Data Representation and Marshalling.
The information stored in running programs is represented as data structures – for example, by
sets of interconnected objects – whereas the information in messages consists of sequences of
bytes. Irrespective of the form of communication used, the data structures must be flattened
(converted to a sequence of bytes) before transmission and rebuilt on arrival.
The individual primitive data items transmitted in messages can be data values of many
different types, and not all computers store primitive values such as integers in the same order.
The representation of floating-point numbers also differs between architectures. To
support any data type that can be passed as an argument or returned as a result must be able
to be flattened and the individual primitive data values represented in an agreed format.
External data representation– an agreed standard for the representation of data structures and
primitive values
Marshalling– the process of taking a collection of data items and assembling them into a form
suitable for transmission in a message
Group Communication
What is a group?
Client and server communication take place when both are connected to each other via a
network. Client and the server are two individual computing systems having their own
operating system, applications and functions. When connected via a network they are able to
share their applications with each other.
It is not necessary that client and server use a same platform as operating system, many varied
operating systems can be connected with each other for advanced communication using
communication protocol. The responsibility of implementing the communication protocol lies
with an application known as communication software.
Using the features of a communication software client and server can exchange files and data
for effective communication. The process of communication between client and server can be
explained as follows: