Distributed File Systems
Distributed File Systems
(DFS)
GROUP 6
PRESENTATION
Introduction
Web servers
P2P file sharing
Distributed storage systems
-Distributed file systems
-Distributed object systems
Components in a DFS
Implementation
Client side:
What has to happen to enable applications access a remote file in the
same way as accessing a local file
Communication layer:
Just TCP/IP or some protocol at higher abstraction
Server side:
How does it service requests from the client
Goals of distributed file service
To enable programs to store and access remote files exactly as they
do local ones
Data sharing of multiple users
Transparency: access, location, mobility, performance, scaling
Backups and centralized management
Distributed file system
requirements
•File replication: A file may be represented by several copies of its contents at
different locations
•Fault tolerance: The service continues to operate in the face of client and
server failures
•Concurrent file updates: changes to a file by one client should not interfere
with the operation of other clients simultaneously accessing or changing the
same file
•Hardware and operating system heterogeneity using middleware
•Consistency: An inevitable delay in the propagation of modifications to all sites
File service architecture
Sun NFS (Network File System)
introduced in 1985
An important goal of NFS is to achieve a high level of support for
hardware and operating system heteroginity
the first file service designed as a product
RFC1813: NFS protocol version 3
Each computer can act as both a client and a server
Industry standard for local networks since the 1980’s
OS independent (originally unix implementation)
– rpc over udp or tcp
Access control and authentication
The NFS server is stateless server, so the user's identity
and access rights must be checked by the server on each
request.
In the local file system they are checked only on
the file’s access permission attribute.
Every client request is accompanied by the userID and
groupID
Kerberos has been integrated with NFS to provide a
stronger and more comprehensive security solution
NFS architecture
Summary for NFS
access transparency: same system calls for local or remote files
Client caches only a block of data
scalability: can cope with an increase of nodes and does not cause any disruption
of service. Scalability also includes the system to withstand high service load,
accommodate growth of users and integration of resources
file replication: read-only replication, no support for replication of files with
updates
security: added encryption--Kerberos
An excellent example of a simple, robust, high-performance distributed service
mobility transparency: mount table need to be updated on each client (not
transparent)
Andrew File System (AFS)
developed at CMU for use as a campus computing and information
system
The design of the AFS reflects an intension to support information
sharing on a large scale by minimizing client-server communication.
this was achieved by transferring whole files between server and client
computers and caching them at clients until the server receives a more
up-to-date version
Goal: provide transparent access to remote shared files
Andrew file system architecture
Andrew file system architecture
simplified
Two unusual design
characteristics
• – Whole file serving: the entire contents of directories
and files are transmitted to client computers
– Whole-file caching: clients permanently cache a copy
of a file or a chunk on its local disk
Scenario of AFS
Open a new shared remote file
– A user process issues open() for a file not in the local cache
– and then sends a request to the server
– The server returns the requested file
– The copy is stored in the client’s local UNIX file system and the
resulting UNIX file descriptor is returned to the client
• Subsequent read, write and other operations on the file are applied to
the local copy
Best for university setup
Scenario of AFS continued…
• When the process in the client issues close()
– if the local copy has been updated, its contents are sent
back to the server
– server updates the contents and the timestamps on the
file
– the copy on the client’s local disk is retained
Characteristics
• Good for shared files likely to remain valid for long periods
– infrequently updated
– normally accessed by only a single user
– Overwhelming majority of file accesses
• Local cache can be allocated a substantial proportion of the disk space
– should be enough for a working set of files used by one user
Characteristics continued…
• Assumptions about average and maximum file size and
reference locality
– Files are small; most are less than 10KB in size
– Read operations are much more common than writes
– Sequential access is much more common than random
access
– Most files are written by only one user. When a file is
shared, it is usually only one user who modified it
– Files are referenced in bursts. A file referenced recently
is very probably referenced soon.
• Maybe good for distributed database applications
Callback mechanism
Restart of workstation after failure
If client B opens a file which client B has already opened, a
notification will be sent to client B if A has edited and closed the
file
retains as many locally cached files as possible, but callbacks may
have been missed
Venus sends cache validation request to the Vice server
– contains file modification timestamp
– if timestamp is current, server sends valid and callback
promise is reinstantiated with valid
– if timestamp not current, server sends cancelled
Callback mechanism
continued…
communication link failures
callback must be renewed with above protocol before new open if a time
T has lapsed since file was cached or callback promise was last validated
Scalability
AFS callback mechanism scales well with increasing number of users
-communication only when file has been updated
-in NFS timestamp approach: for each open
since majority of files not accessed concurrently, and reads more
frequent than writes, callback mechanism performs better
Cache Consistency problem
Files are identified with one master copy residing at the server machine, but
copies (or parts) of the file are scattered in different caches
When a cached copy is modified, the changes need to be reflected on the
master copy to preserve the relevant consistency semantics.
The problem of keeping the cached copies consistent with the master file is
the cache-consistency problem
A client machine is sometimes faced with the problem of deciding whether a
locally cached copy of data is consistent with the master copy (and hence can
be used).
If the client machine determines that its cached data are out of date, it must
cache an up-to-date copy of the data before allowing further accesses
Cache-Update Policy9
The policy used to write modified data blocks back to the server’s
master copy has a critical effect on the system’s performance and
reliability.
The simplest policy is to write data through to disk as soon as they are
placed in any cache.
The advantage of a write-through policy is reliability: little information is
lost when a client system crashes.
However, this policy requires each write access to wait until the
information is sent to the server, so it causes poor write performance.
Caching with write-through is equivalent to using remote service for
write accesses and exploiting caching only for read accesses
Cache-Update Policy continued…
An alternative is the delayed-write policy, also known as write-back caching,
where we delay updates to the master copy.
Modifications are written to the cache and then are written through to the
server at a later time.
This policy has two advantages over write-through.
Firstly, because writes are made to the cache, write accesses complete
much more quickly.
Secondly, data may be overwritten before they are written back, in which
case only the last update needs to be written at all.
Unfortunately, delayed-write schemes introduce reliability problems,
since unwritten data are lost whenever a user machine crashes.
File Update Semantics
What is replication?