UNIT 2
UNIT 2
Distributed System
Key Points:
1. Shared Memory Illusion: DSM provides an illusion of a single shared
memory, even though the memory is physically distributed across
multiple machines.
2. Communication Simplification: Instead of sending explicit messages
between computers, programs can simply read and write to the shared
memory.
3. Transparency: The system hides the complexity of data transfer between
nodes, making it appear as though all memory is local.
4. Consistency: DSM ensures data consistency so that when one node
updates the shared memory, other nodes see the latest changes.
5. Applications: DSM is used in parallel computing, distributed systems,
and for tasks requiring high-speed data sharing between nodes.
Example Analogy:
Imagine a group of people working together on a whiteboard. Each person can
write and read from the same whiteboard without having to pass notes to each
other. DSM is like that whiteboard in a computer network.
Design & Implementation Issues in DSM Systems
When creating a Distributed Shared Memory (DSM) system, there are several
challenges and decisions to address. Here's a simple explanation:
1. Memory Management
• Problem: How to organize and manage memory across multiple
computers.
• Solution: Use techniques to divide memory into pages (like chapters in a
book) and ensure efficient access.
2. Data Placement
• Problem: Where should data be stored across the network?
• Solution: Place data near the nodes that use it the most to reduce
delays.
3. Data Consistency
• Problem: If one computer updates the shared memory, how do others
see the changes?
• Solution: Use consistency models (like sequential or eventual
consistency) to ensure data is synchronized.
4. Communication Overhead
• Problem: Sharing data between computers involves network
communication, which can be slow.
• Solution: Minimize unnecessary communication by caching data or
batching updates.
5. Synchronization
• Problem: How to handle situations where multiple computers try to
access or modify the same data at the same time.
• Solution: Use locks, semaphores, or atomic operations to manage
access.
6. Fault Tolerance
• Problem: What happens if one computer crashes or loses connection?
• Solution: Implement mechanisms to detect failures and replicate data to
avoid loss.
7. Scalability
• Problem: Can the system handle more computers as needed?
• Solution: Design the system to support more nodes without slowing
down performance.
8. Security
• Problem: How to protect shared memory from unauthorized access or
attacks.
• Solution: Use encryption, authentication, and access controls.
Example Analogy:
Imagine a group project where everyone works on a shared document:
• Memory Management: Decide how to divide and assign parts of the
document.
• Data Placement: Keep sections with the person who edits them the
most.
• Synchronization: Ensure two people don’t edit the same sentence at the
same time.
These issues must be carefully addressed to create an efficient and reliable
DSM system!
1. Strict Consistency
• What it means: Any read operation on shared memory will always return
the most recent write.
• Example: Imagine a chalkboard where if one person writes something,
everyone immediately sees it.
• Pros: Perfect synchronization.
• Cons: Hard to achieve because it requires instant updates to all nodes,
which is slow in real-world networks.
2. Sequential Consistency
• What it means: All nodes see memory operations (reads and writes) in
the same order, but the order doesn’t have to be the real-time order.
• Example: Think of a queue where actions happen one by one. If Node A
writes and then Node B writes, all nodes will see those writes in the
same sequence (A first, then B).
• Pros: Easier to implement than strict consistency.
• Cons: Can still be slow due to synchronization requirements.
3. Causal Consistency
• What it means: If one operation (write) causes another, all nodes must
see them in the same order. But if operations are independent, they can
be seen in different orders.
• Example: If you post a message ("Hello") and someone replies to it
("Hi"), all nodes will see "Hello" before "Hi." But two unrelated messages
can appear in any order.
• Pros: More flexible and faster than sequential consistency.
• Cons: Slightly more complex to implement.
4. Eventual Consistency
• What it means: Changes to shared memory will eventually become
visible to all nodes, but not immediately.
• Example: A news update takes time to reach everyone, but eventually,
everyone gets the same information.
• Pros: Very fast and used in systems like NoSQL databases (e.g.,
DynamoDB, Cassandra).
• Cons: Temporary inconsistencies can occur, which might confuse some
applications.
5. Weak Consistency
• What it means: Updates become visible only after certain
synchronization points.
• Example: You’re editing a document. Changes become visible to others
only when you save the file.
• Pros: Improves performance by reducing communication overhead.
• Cons: Requires careful programming to avoid errors.
6. Release Consistency
• What it means: Shared memory updates are visible only when a special
"release" operation is performed.
• Example: Imagine workers at a construction site. Everyone gets updated
instructions only after the foreman announces them.
• Pros: High performance for certain applications.
• Cons: Programmers need to manage release operations.
Summary Table
Eventual
Eventually consistent Fast Cloud databases (NoSQL)
Consistency
What is Trashing?
Trashing happens in a computer system when it spends more time swapping
data between the RAM and the hard drive (or other storage) instead of doing
actual work. This can make the system very slow.
Simple Analogy:
Imagine you’re studying with limited desk space. If your desk is too small to
hold all your books, you keep putting books on the shelf and taking them back.
You waste time moving books instead of studying. This is like trashing in a
computer!
Trashing is a sign that the system is struggling with memory management and
needs attention to work efficiently.
2. Addressing
Each segment in the shared memory space has a unique address so processes
can find and access it:
• Logical Address: A program’s view of memory, making it simple to use.
• Physical Address: The actual location in the system memory (RAM).
3. Access Control
To ensure safety, access to the shared memory is controlled:
• Read-Only Access: Some processes can only read data but not modify it.
• Read-Write Access: Some processes can both read and write data.
• Access is managed using permissions to prevent conflicts or errors.
4. Synchronization Mechanisms
To avoid multiple processes modifying the same data at the same time:
• Locks: Prevent simultaneous access.
• Semaphores: Allow controlled access by multiple processes.
• Barriers: Ensure all processes reach a certain point before proceeding.
5. Consistency Management
The system ensures that:
• When one process updates data, other processes see the updated value.
• Changes are synchronized across all users of the shared memory.
Simple Analogy:
Think of shared memory space like a whiteboard in a classroom:
• Segments: Different sections of the whiteboard for different subjects
(math, science, etc.).
• Addressing: Each section has a label, so you know where to write or
read.
• Access Control: Rules decide who can write or just read (e.g., students
vs. teachers).
• Synchronization: Only one person writes at a time to avoid overwriting.
File Model
In a distributed system, where files are stored and managed across multiple
computers or servers, file models are still classified based on structure and
modifiability. Let’s explain this in simple terms:
1. Based on Structure
This focuses on how data is organized in the file, even when it’s spread across
multiple systems.
• Structured Files:
o Data is highly organized and follows a predictable format (e.g.,
rows, columns, or key-value pairs).
o These files make it easier for distributed systems to process, share,
and query the data quickly.
o Example: A distributed database like HBase or a file stored in a
table-like structure in Hadoop.
o Analogy: Think of a shared online spreadsheet where every entry
fits neatly into a predefined box.
• Unstructured Files:
o Data does not follow a strict format. It could be plain text, images,
videos, or logs.
o In distributed systems, unstructured data requires extra tools (like
indexing or AI) to organize or analyze it.
o Example: A distributed storage system like Amazon S3 storing
photos, videos, or PDFs.
o Analogy: Imagine sharing random notes or pictures with no
consistent arrangement.
2. Based on Modifiability
This focuses on whether files can be changed after being stored in the
distributed system.
• Mutable Files:
o Files can be updated or edited directly, even when distributed.
o These systems allow changes but often require coordination to
ensure all copies (on different servers) stay consistent.
o Example: A shared Google Doc that multiple users can edit
simultaneously.
o Challenge: In distributed systems, ensuring consistency (so all
servers have the same version) is hard and requires protocols like
locks or version control.
• Immutable Files:
o Files cannot be edited after being created. If changes are needed,
a new version of the file is created instead.
o These systems are easier to manage in distributed environments
because no coordination is required for updates.
o Example: Log files in systems like Apache Kafka, where each record
is appended and never changed.
o Analogy: Like sending a letter—it’s permanent once written and
sent. If you want changes, you write a new one.
Example in Practice:
• Structured + Mutable: A shared database that multiple servers can
update, like a distributed SQL database.
• Unstructured + Immutable: A distributed file system like AWS S3 storing
videos or backups that don’t change after upload.
Understanding this helps design efficient distributed systems based on the type
of files and operations needed!
File Sharing Semantics
File Sharing Semantics in distributed systems define the rules for how changes
to files are shared and visible to users. Here’s a simple explanation of four
specific types: Unix, Session, Immutable, and Transaction-like Semantics:
1. Unix Semantics
• What It Means: Every user sees the most recent changes to a file
immediately after any modification.
• How It Works:
o If User A edits a file, User B will instantly see the updates, as if
they are both working on the same computer.
o This is the behavior of traditional Unix file systems (like Linux).
• Challenge in Distributed Systems: Achieving this in a distributed system is
difficult because updates need to sync across all servers instantly.
• Analogy: Imagine writing on a whiteboard. Everyone can see the changes
as you write.
2. Session Semantics
• What It Means: Changes made by a user are visible only to that user
during their editing session. Other users see the changes only after the
session ends (when the file is closed and saved).
• How It Works:
o If User A opens a file, edits it, and saves it, User B will only see the
changes after User A finishes and closes the file.
o This reduces conflicts and ensures smoother editing.
• Common Use: Systems like Dropbox often work this way.
• Analogy: Think of borrowing a book. You can read and make notes, but
others won’t see your changes until you return it.
3. Immutable Semantics
• What It Means: Files cannot be modified once created. If changes are
needed, a new version of the file is created instead.
• How It Works:
o If User A wants to update a file, they create a new version (e.g.,
"file_v2") rather than editing the original file.
o This ensures consistency, as the original file remains unchanged.
• Common Use: Systems like Git or blockchain use this approach to avoid
conflicts and maintain history.
• Analogy: Think of taking a photo. You can’t change the photo itself, but
you can take another one if needed.
4. Transaction-like Semantics
• What It Means: File changes happen in a series of steps, and all the steps
must succeed or fail together (like a database transaction).
• How It Works:
o If User A wants to update a file, the system ensures all changes are
completed successfully. If something goes wrong, the file reverts
to its original state (no partial updates are allowed).
o This ensures data integrity, especially in critical systems.
• Common Use: Used in systems requiring high reliability, like banking or
airline reservation systems.
• Analogy: Think of transferring money between two accounts. The
transfer only succeeds if both accounts are updated correctly—
otherwise, it’s canceled.
Summary
1. Unix Semantics: Changes are immediately visible to everyone.
2. Session Semantics: Changes are visible only after the file is closed.
3. Immutable Semantics: Files cannot be changed, only replaced with new
versions.
4. Transaction-like Semantics: Changes are made in an all-or-nothing
manner, ensuring data consistency.
Each semantic is chosen based on the needs of the distributed system,
balancing speed, consistency, and reliability.