Dist_Sys_Unit_4_Notes
Dist_Sys_Unit_4_Notes
• File System performs organization, storage, retrieval, sharing and protection of files.
A file system is a subsystem of the operating system that performs file management activities such
as organization, storing, retrieval, naming, sharing, and protection of files.
A file system frees the programmer from concerns about the details of space allocation and layout of
the secondary storage device.
The design and implementation of a distributed file system is more complex than a conventional file
system due to the fact that the users and storage devices are physically dispersed.
In addition to the functions of the file system of a single-processor system, the distributed file system
supports the following:
1. Remote information sharing: Thus any node, irrespective of the physical location of the file, can
access the file.
3. Availability: For better fault-tolerance, files should be available for use even in the event of
temporary failure of one or more nodes of the system. Thus the system should maintain multiple
copies of the files, the existence of which should be transparent to the user.
1. Storage service: Allocation and management of space on a secondary storage device thus
providing a logical view of the storage system.
2. True file service: Includes file-sharing semantics, file-caching mechanism, file replication
mechanism, concurrency control, multiple copy update protocol etc.
3. Name/Directory service: Responsible for directory related activities such as creation and
deletion of directories, adding a new file to a directory, deleting a file from a directory, changing the
name of a file, moving a file from one directory to another etc.
1. Transparency
- Structure transparency
Clients should not know the number or locations of file servers and the storage devices. Note: multiple
file servers provided for performance, scalability, and reliability.
- Access transparency
Both local and remote files should be accessible in the same way. The file system should automatically
locate an accessed file and transport it to the client’s site.
- Naming transparency
The name of the file should give no hint as to the location of the file. The name of the file must not be
changed when moving from one node to another.
- Replication transparency
If a file is replicated on multiple nodes, both the existence of multiple copies and their locations should
be hidden from the clients.
2. User mobility: Automatically bring the user’s environment (e.g. users home directory) to the node
where the user logs in.
3. Performance: Performance is measured as the average amount of time needed to satisfy client
requests. This time includes CPU time + time for accessing secondary storage + network access time. It
is desirable that the performance of a distributed file system be comparable to that of a centralized
file system.
4. Simplicity and ease of use: User interface to the file system be simple and number of commands
should be as small as possible.
5. Data integrity: Concurrent access requests from multiple users who are competing to access the file
must be properly synchronized by the use of some form of concurrency control mechanism. Atomic
transactions can also be provided.
6. High availability: A distributed file system should continue to function in the face of partial failures
such as a link failure, a node failure, or a storage device crash.
7. High reliability: Probability of loss of stored data should be minimized. System should automatically
generate backup copies of critical files.
8. Scalability: Growth of nodes and users should not seriously disrupt service.
• A highly reliable and scalable distributed file system should have multiple and independent file
servers controlling multiple and independent storage devices.
2. High Availability: System failures or failures in regular activities should not result into
unavailability of files
• Network File System (NFS) is a distributed file system developed by sun micro systems. It is
very popular.
• The model underlying NFS and similar systems is that of a remote file service
• The idea behind NFS is that each file server provides a standardized view of local file
Client-Server Architecture
Communication
Side effects in Coda’s RPC2 system allows application specific protocols during communication
• Naming is a mapping between logical and physical objects. For example, users refer to a file by
a textual name, but it is mapped to disk blocks.
• Path Name : Files are named by some combination of machine or host name and path name.
This may be used in server side.
Name Service is the principal mechanism used in distributed systems for referring to objects within
your applications via a name identifying that object. Examples are filenames, domain names and so on.
The association between a name & an object is called a binding.
Auto mounting, also known as autofs, is a client-side service that automatically mounts and unmounts
file systems in a distributed system
On a single processor, when a read follows a write, value returned by read is the value just written.
File Locking
NFSv4 operations related to file locking
• Client-side caching
• Caching in NFS
• Caching in Coda
• Server-side replication
File delegation : process of granting a client the right to access a file on an NFS server
Two clients with a different AVSG for the same replicated file.
A Byzantine failure in a distributed system occurs when a node provides incorrect or misleading
information to other nodes. Byzantine failures are also known as Byzantine generals problems or
Byzantine agreement problems.
The various kinds of users and processes distinguished by NFS with respect to access control.
• Each GFS cluster consists of a single master along with multiple chunk servers.
• Each GFS file is divided into chunks of 64 Mbyte each, after which these chunks are distributed
across what are called chunk servers.
• An important observation is that a GFS master is contacted only for metadata information.
• In particular, a GFS client passes a file name and chunk index to the master, expecting a
contact address for the chunk.
• The contact address contains all the information to access the correct chunk server to obtain
the required file chunk.
Design Considerations
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
It’s a client server application. It uses RPC to route requests between client & server.
• Andrew File Systems is a distributed file system, which uses a set of remote servers to access
files
• AFS uses a local cache to reduce the workload increase the performance of distributed
computing environment
Security in CODA
• Setting up a secure channel between client and a server using RPC system level authentication
Utilization of CODA
• Many organizations like University (CMU) are using CODA at their campus and making a
serious effort to improve Coda in the given areas:
• Extensions in functionality
• The World wide web(www) can be viewed as huge distributed system consisting of millions of
client's and servers for accessing linked documents
• Server maintain collection of documents, While client provides users an easy to use interface
for presenting and accessing these document
Six top-level MIME types : text, Audio, Video, Image,Application & Multipart
The principle of using a server cluster in combination with a front end to implement a Web service
• Web proxy caching in a distributed system refers to the method of using proxy servers to store
and manage cached web content across multiple locations within a network
• In a distributed system, each proxy server caches copies of frequently accessed web content.
This means that when multiple users request the same content, the proxy server can deliver it
from its cache rather than fetching it from the original web server every time.
Components
• Clients: These are the end-user devices (computers, smartphones, tablets) that make HTTP
GET requests for web content.
• Web Proxy: Acts as an intermediary between clients and web servers. It handles client
requests, retrieves content from its local cache if available, or forwards the request if
necessary.
• Cache: Storage within the proxy server where cached web content is saved.
• Neighboring Proxy Caches: Other proxy servers in the network that can be queried if the
requested content is not found locally.
• Web Server: The original server that hosts the requested web content.
Connections can be done by HTTP -Hyper Text transfer protocol & SOAP – Simple Object Access
Protocol
HTTP is a client server protocol. Communication between clients & server is based on HTTP.
• SOA allows users to combine a large number of facilities from existing services to form
applications.
• SOA encompasses a set of design principles that structure system development and provide
means for integrating components into a coherent and decentralized system.
• SOA-based computing packages functionalities into a set of interoperable services, which can
be integrated into different software systems belonging to separate business domains.
Characteristics of SOA
• Provides methods for service encapsulation, service discovery, service composition, service
reusability and service integration.
SOA can take a role of both service provider & service consumer accordingly.
REST
• Simple interface. In REST, each resource involved in client-server interactions is identified and
is uniformly represented in the server response to define a consistent and simple interface for
all interactions.
• Familiar constructs. REST interactions are based on constructs that are familiar to anyone
accustomed to using HTTP, including operations (GET, POST, DELETE, etc.) and URIs. That said,
REST and HTTP are not the same and developers must note the differences when
implementing and using REST.
• Communication. The status of REST-based interactions between the server and clients is
communicated through numerical HTTP status codes.
WEB SERVICES: Services available over the web are web services. It uses WSDL web service Definition
Language.