0% found this document useful (0 votes)
10 views

Dist_Sys_Unit_4_Notes

The document discusses distributed file systems (DFS), which manage file organization, storage, retrieval, sharing, and protection in a distributed environment. It highlights the complexities of DFS, including remote information sharing, user mobility, and high availability, while outlining desirable characteristics such as transparency, performance, and security. Additionally, it covers various distributed file systems like NFS, GFS, and CODA, detailing their architectures, services, and functionalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Dist_Sys_Unit_4_Notes

The document discusses distributed file systems (DFS), which manage file organization, storage, retrieval, sharing, and protection in a distributed environment. It highlights the complexities of DFS, including remote information sharing, user mobility, and high availability, while outlining desirable characteristics such as transparency, performance, and security. Additionally, it covers various distributed file systems like NFS, GFS, and CODA, detailing their architectures, services, and functionalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Distributed Systems CSE

DISTRIBUTED FILE SYSTEMS

• File System performs organization, storage, retrieval, sharing and protection of files.

• DFS is a resource management component of distributed operating system.

• It provides storage and retrieval of files in distributed environment.

• Users and storage devices of DFS are physically dispersed.

Two main purposes of using files:

1. Permanent storage of information on a secondary storage media.

2. Sharing of information between applications.

A file system is a subsystem of the operating system that performs file management activities such
as organization, storing, retrieval, naming, sharing, and protection of files.

A file system frees the programmer from concerns about the details of space allocation and layout of
the secondary storage device.

The design and implementation of a distributed file system is more complex than a conventional file
system due to the fact that the users and storage devices are physically dispersed.

In addition to the functions of the file system of a single-processor system, the distributed file system
supports the following:

1. Remote information sharing: Thus any node, irrespective of the physical location of the file, can
access the file.

2. User mobility: User should be permitted to work on different nodes.

3. Availability: For better fault-tolerance, files should be available for use even in the event of
temporary failure of one or more nodes of the system. Thus the system should maintain multiple
copies of the files, the existence of which should be transparent to the user.

DISTRIBUTED FILE SYSTEM SERVICES

A distributed file system provides the following types of services:

1. Storage service: Allocation and management of space on a secondary storage device thus
providing a logical view of the storage system.

2. True file service: Includes file-sharing semantics, file-caching mechanism, file replication
mechanism, concurrency control, multiple copy update protocol etc.

3. Name/Directory service: Responsible for directory related activities such as creation and
deletion of directories, adding a new file to a directory, deleting a file from a directory, changing the
name of a file, moving a file from one directory to another etc.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE
Desirable characteristics of a distributed file system

1. Transparency

- Structure transparency

Clients should not know the number or locations of file servers and the storage devices. Note: multiple
file servers provided for performance, scalability, and reliability.

- Access transparency

Both local and remote files should be accessible in the same way. The file system should automatically
locate an accessed file and transport it to the client’s site.

- Naming transparency

The name of the file should give no hint as to the location of the file. The name of the file must not be
changed when moving from one node to another.

- Replication transparency

If a file is replicated on multiple nodes, both the existence of multiple copies and their locations should
be hidden from the clients.

2. User mobility: Automatically bring the user’s environment (e.g. users home directory) to the node
where the user logs in.

3. Performance: Performance is measured as the average amount of time needed to satisfy client
requests. This time includes CPU time + time for accessing secondary storage + network access time. It
is desirable that the performance of a distributed file system be comparable to that of a centralized
file system.

4. Simplicity and ease of use: User interface to the file system be simple and number of commands
should be as small as possible.

5. Data integrity: Concurrent access requests from multiple users who are competing to access the file
must be properly synchronized by the use of some form of concurrency control mechanism. Atomic
transactions can also be provided.

6. High availability: A distributed file system should continue to function in the face of partial failures
such as a link failure, a node failure, or a storage device crash.

7. High reliability: Probability of loss of stored data should be minimized. System should automatically
generate backup copies of critical files.

8. Scalability: Growth of nodes and users should not seriously disrupt service.

• A highly reliable and scalable distributed file system should have multiple and independent file
servers controlling multiple and independent storage devices.

9. Security: Users should be confident of the privacy of their data.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE
10. Heterogeneity: There should be easy access to shared data on diverse platforms (e.g. Unix
workstation, Wintel platform etc).

GOALS OF DISTRIBUTED FILE SYSTEMS

• DFS has two important goals:

1. Network transparency: Users are not aware of location of files

2. High Availability: System failures or failures in regular activities should not result into
unavailability of files

NETWORK FILE SYSTEM (NFS)

• The Way of arrangement is Client-Server Architecture

• Network File System (NFS) is a distributed file system developed by sun micro systems. It is
very popular.

• The model underlying NFS and similar systems is that of a remote file service

• The idea behind NFS is that each file server provides a standardized view of local file

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE
Architecture of Distributed File System

Client-Server Architecture

Goal: Try to make a file system transparently available to remote


clients.

(a) The remote access model. (b) The upload/download model

The basic NFS architecture for UNIX systems

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE
File System Operations

Cluster-Based Distributed File Systems

(a) distributing whole files across several servers

(b) striping files for parallel access

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

The organization of a Google cluster of servers

Ivy Distributed File Systems

DHash: Computing look-up keys: content-based or public key based

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE
RPC calls in NFS

Communication

(a) Reading data from a file in NFS version 3.

(b) Reading data using a compound procedure in version 4.

Coda RPC2 Subsystem

Side effects in Coda’s RPC2 system allows application specific protocols during communication

A file is modified, and all outdated copies need to be invalidated

(a) Sending an invalidation message one at a time.

(b) multicasting: sending invalidation messages in parallel

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4


Distributed Systems CSE

Files associated with a single TCP connection

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 5


Distributed Systems CSE
Naming & Name Service in NFS

• Naming is a mapping between logical and physical objects. For example, users refer to a file by
a textual name, but it is mapped to disk blocks.

• Path Name : Files are named by some combination of machine or host name and path name.
This may be used in server side.

• Mount service : mount the remote directories to the local directories.

Name Service is the principal mechanism used in distributed systems for referring to objects within
your applications via a name identifying that object. Examples are filenames, domain names and so on.
The association between a name & an object is called a binding.

A name service is a collection of naming context. Its operation is name resolution.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4


Distributed Systems CSE
Mounting, Synchronization, File Sharing and locking

Mounting (part of) a remote file system in NFS.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE
Mount can be of 3 types: Soft mount-Time bound mounting, hard mount - no time bound, auto mount
– on demand mounting

Auto mounting, also known as autofs, is a client-side service that automatically mounts and unmounts
file systems in a distributed system

A simple automounter for NFS

Synchronization - Semantics of File Sharing

On a single processor, when a read follows a write, value returned by read is the value just written.

In a distributed system with caching, obsolete values may be returned.


Four ways of dealing with the shared files in a distributed system.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

File Locking
NFSv4 operations related to file locking

File Sharing in Coda


The transactional behavior in sharing files in Coda

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE
Caching and Replication in Distributed File System

• Client-side caching

• Caching in NFS

• Caching in Coda

• Server-side replication

• Server replication in Coda

NFS Client-Side Caching

Uses NFSv4 callback mechanism to recall file delegation.

File delegation : process of granting a client the right to access a file on an NFS server

Client-Side Caching in Coda

The use of local copies when opening a session in Coda.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

Server Replication in Coda

Coda uses a variant of replicated-write protocol, ROWA(Read-one, write all)

Two clients with a different AVSG for the same replicated file.

Handling Byzantine Failures

3k+1 replicas for k faulty tolerance

A Byzantine failure in a distributed system occurs when a node provides incorrect or misleading
information to other nodes. Byzantine failures are also known as Byzantine generals problems or
Byzantine agreement problems.

The different phases in Byzantine fault tolerance

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

Security in NFS - The NFS security architecture

Secure RPCs : In NFS-V4

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE
Access Control

The various kinds of users and processes distinguished by NFS with respect to access control.

Secure Collaborative Storage

Storage claims in the peer-to-peer system

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4


Distributed Systems CSE
Google File System (GFS)

• Each GFS cluster consists of a single master along with multiple chunk servers.

• Each GFS file is divided into chunks of 64 Mbyte each, after which these chunks are distributed
across what are called chunk servers.

• An important observation is that a GFS master is contacted only for metadata information.

• In particular, a GFS client passes a file name and chunk index to the master, expecting a
contact address for the chunk.

• The contact address contains all the information to access the correct chunk server to obtain
the required file chunk.

Cluster-Based Distributed File Systems - The organization of a Google cluster of servers

Design Considerations
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 5


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 6


Distributed Systems CSE
Sun NFS, VFS & AFS

Sun NFS : Developed by Sun Microsystems

It allows a remote client to access file system over a network.

It’s a client server application. It uses RPC to route requests between client & server.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

Virtual File System:

File handle : File identifier used in NFS is calle file handle

VFS is used to distinguish between local & remote files.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4


Distributed Systems CSE

• Andrew File Systems is a distributed file system, which uses a set of remote servers to access
files

• AFS uses a local cache to reduce the workload increase the performance of distributed
computing environment

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 5


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 6


Distributed Systems CSE
CODA

• CONTENT DELIVERY ARCHITECHTURE

• COMMON DATA AVAILABLITY

CoDA architecture is based on AFS architecture.

CoDA is a file system for a large scale distributed computing environment.

CoDA optimizes: Availability, Performance, Highest degree of consistency.

It provides resiliency to server and network failures through 2 mechanisms

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

Security in CODA

CODA Architecture consists of two parts which deal with:

• Setting up a secure channel between client and a server using RPC system level authentication

• Controlling access to files

Utilization of CODA

• Many organizations like University (CMU) are using CODA at their campus and making a
serious effort to improve Coda in the given areas:

• Reliability and performance

• Ports to important platforms

• Documentation, mailing groups

• Extensions in functionality

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE
DISTRIBUTED WEB BASED SYSTEMS

• The World wide web(www) can be viewed as huge distributed system consisting of millions of
client's and servers for accessing linked documents

• Server maintain collection of documents, While client provides users an easy to use interface
for presenting and accessing these document

Traditional Web-Based Systems

The overall organization of a traditional Web site

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

Six top-level MIME types : text, Audio, Video, Image,Application & Multipart

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Processes and Communication in Distributed Web-based Systems
Distributed Systems CSE

The logical components of a Web browser

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE

The principle of using a server cluster in combination with a front end to implement a Web service

A scalable content-aware cluster of Web servers

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4


Distributed Systems CSE
Web Proxy Caching, Replication & CDNs

• Web proxy caching in a distributed system refers to the method of using proxy servers to store
and manage cached web content across multiple locations within a network

• In a distributed system, each proxy server caches copies of frequently accessed web content.
This means that when multiple users request the same content, the proxy server can deliver it
from its cache rather than fetching it from the original web server every time.

Components

• Clients: These are the end-user devices (computers, smartphones, tablets) that make HTTP
GET requests for web content.

• Web Proxy: Acts as an intermediary between clients and web servers. It handles client
requests, retrieves content from its local cache if available, or forwards the request if
necessary.

• Cache: Storage within the proxy server where cached web content is saved.

• Neighboring Proxy Caches: Other proxy servers in the network that can be queried if the
requested content is not found locally.

• Web Server: The original server that hosts the requested web content.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

CDN : Content Delivery networks

Content Providers are the customers of CDN Services.

CDN has 2 levels of load balancing : local & Global

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3


Distributed Systems CSE
HTTP, SOAP, SOA, REST & Web Services

Connections can be done by HTTP -Hyper Text transfer protocol & SOAP – Simple Object Access
Protocol

HTTP is a client server protocol. Communication between clients & server is based on HTTP.

HTTP includes HTTP connections, HTTP Methods & HTTP Messages.

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1


Distributed Systems CSE

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2


Distributed Systems CSE
Service-Oriented Architecture (SOA)

• Service-Oriented Architecture (SOA) is a stage in the evolution of application development


and/or integration. It defines a way to make software components reusable using the
interfaces.

SOA is different from micro-service architecture.

• SOA allows users to combine a large number of facilities from existing services to form
applications.

• SOA encompasses a set of design principles that structure system development and provide
means for integrating components into a coherent and decentralized system.

• SOA-based computing packages functionalities into a set of interoperable services, which can
be integrated into different software systems belonging to separate business domains.

Characteristics of SOA

• Provides interoperability between the services.

• Provides methods for service encapsulation, service discovery, service composition, service
reusability and service integration.

• Facilitates QoS (Quality of Services) through service contract based on Service


Level Agreement (SLA).

• Provides loosely couples services.

• Provides location transparency with better scalability and availability.

• Ease of maintenance with reduced cost of application development and deployment.

SOA can take a role of both service provider & service consumer accordingly.

REST

• REST (REpresentational State Transfer) is an architectural style for developing web


services and systems that can easily communicate with each other. REST is popular due to its
simplicity and the fact that it builds upon existing systems and features of the
internet's HTTP to achieve its objectives, as opposed to creating new standards, frameworks
and technologies.
• It is popularly believed that REST is a protocol or standard. However, it is neither. REST is an
architectural style that is commonly adopted for building web-based application programming
interfaces .
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
Advantages of REST

• Resource-based. REST enforces statelessness through resources rather than commands,


improving reliability, performance and scalability.

• Simple interface. In REST, each resource involved in client-server interactions is identified and
is uniformly represented in the server response to define a consistent and simple interface for
all interactions.

• Familiar constructs. REST interactions are based on constructs that are familiar to anyone
accustomed to using HTTP, including operations (GET, POST, DELETE, etc.) and URIs. That said,
REST and HTTP are not the same and developers must note the differences when
implementing and using REST.

• Communication. The status of REST-based interactions between the server and clients is
communicated through numerical HTTP status codes.

WEB SERVICES: Services available over the web are web services. It uses WSDL web service Definition
Language.

The principle of a Web service

SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy