Distributed File System
Distributed File System
Data-caching model
It gives advantage of the locality feature found in file
accesses.
Data is copied from the server‟s node to the client‟s node
and is cached on the client‟s node.
Cache consistency problem.
Unit of data transfer
1. File-level transfer model: AFS-2, Amoeba
2. Block-level transfer model: Sun Microsystems
NFS
3. Byte-level transfer model: Cambridge File
System
4. Record-level transfer model: Structured Files
File-sharing semantics
Unix semantics
It enforces an absolute time ordering on all
operations.
Session semantics
All changes made to a file during a session are
initially made visible only to the client process that
opened the session &
Invisible to other remote processes who have the
same file open simultaneously.
Immutable shared-files semantics
Transaction-like semantics
It is based on the transaction mechanism, which is
a high-level mechanism for controlling concurrent
access to shared, mutable data.
File-caching schemes
Several key decisions, such as:
Granularity of cached data
Cache size
Replacement policy
Cache location
Modification propagation
Cache validation
Cache location
Issues:
When to propagate modifications made to a cached
data to the corresponding file server?
Delayed-write scheme
Write on Ejection from Cache
Periodic Write
Write on Close: like session semantics
Delayed-write Scheme
When a cache entry is modified, the new value
is written only to the cache and the client just
makes a note that the cache entry has been
updated.
Cache validation schemes
Client-initiated approach
Checking before every access: Unix like semantic
Periodic checking
Check on file open: Session semantic
Cont…
Server-initiated approach
A client informs the file server when opening a file,
indicating whether the file is being opened for reading,
writing, or both.
2. Increased reliability
6. Better scalability
7. Autonomous operation
File replication
Replication transparency
• Naming of replicas
• Replication control
1. Explicit replication
2. Read-any-write-all protocol
3. Available-copies protocol
4. Primary-copy protocol
5. Quorum-based protocols
Multicopy Update Problem
Read-only replication
Replication of only Immutable Files
Read-any-write-all protocol
For Mutable files
Unix like semantic
Lock all copy and update
Available-copies protocol
Update available copies (some server may be
down)
Primary-copy protocol
One copy designated as Primary copy, rest
secondary
Write operation only on primary copy
Secondary updated by push / pull, Unix sem
or Lazily
Quorum-based protocol
2. Robustness
3. Recoverability
Fault tolerance
Storage
1. Volatile storage
2. Nonvolatile storage
3. Stable storage
Fault tolerance & Service paradigm
1. Atomicity
Failure atomicity / all-or-nothing property
3. Permanence / durability
Need for transactions in a file service
Concurrent access
Operations for transactions-based file service
1. begin_transaction
2. end_transaction
3. abort_transaction
Recovery Techniques
File versions approach
Avoid overwriting of actual data in physical storage
When a transaction begins, the server creates a tentative
version from current version for write operation.
When transaction is committed, tentative version is made
the new current version and
Previous current version added to sequence of old version
Serializability Conflict – when merging various tentative
versions.
When 2 or more transactions are allowed to access same data
item & one or more of these is Wr Op.
Recovery Techniques
Shadow blocks technique for implementing file versions
Shadow blocks technique for implementing file versions
is used as an optimization that allows creation of a
tentative version of a file without the need to copy the
full file. In fact, it removes most of the copying.
Here entire dist space partitioned into blocks.
File system maintains index for each file and list of free
blocks.
Tentative Ver. of a file is created by copying index of
current Ver. of the file
Recovery Techniques
The write-ahead log approach
Write-ahead log maintained on a stable storage.
A record is first created and written to a Log.
After this, operation is performed on file to modify
its contents.
It is used for recording file updates in a recoverable
manner.
A log file “write-ahead log” is used to record the
operations of a transaction that modifies the file.
Concurrency control
Allows maximum concurrency with minimum
overhead.
timestamps
Locking
In the basic locking mechanism, a transaction locks
a data item before accessing it.
Drawbacks:
Old versions of files are required to be retained
for validation process.
Starvation of a transactions.
6. Batch if possible
Grouping operations together can improve throughput.