0% found this document useful (0 votes)
4 views

Distributed File System

A Distributed File System (DFS) enables permanent storage and sharing of information in a distributed environment, offering advantages like user mobility, availability, and remote information sharing. It provides various services such as storage, true file service, and name service, while emphasizing desirable features like transparency, performance, and security. The document also discusses file models, accessing methods, caching schemes, replication, fault tolerance, and concurrency control techniques essential for maintaining data integrity and consistency in a distributed system.

Uploaded by

hiral2004gabhane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Distributed File System

A Distributed File System (DFS) enables permanent storage and sharing of information in a distributed environment, offering advantages like user mobility, availability, and remote information sharing. It provides various services such as storage, true file service, and name service, while emphasizing desirable features like transparency, performance, and security. The document also discusses file models, accessing methods, caching schemes, replication, fault tolerance, and concurrency control techniques essential for maintaining data integrity and consistency in a distributed system.

Uploaded by

hiral2004gabhane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Distributed File System

Distributed File System


 Purpose of using the files:
 Permanent storage of information
 Sharing of information

 A distributed file system provides abstraction


to the users of a distributed system & makes
it convenient to use files in a distributed
environment.
Advantages of DFS
 Remote information sharing
 User mobility
 Availability
 Diskless workstations

DFS provides the following services:


 Storage service
 True file service
 Name service
Desirable features of a good DFS
 Transparency
 Structure transparency
 Access transparency
 Naming transparency
 Replication transparency
Cont…
 Performance
 Usually measured as the average amount of
time needed to satisfy client requests.
 In centralized file system, this time includes the
time for accessing the secondary storage device
on which the file is stored and the CPU
processing time.
 In DFS, this time includes network
communication overhead when the accessed file
is remote.
Cont…
 User mobility
 Simplicity & ease of use
 Scalability
 High availability
 High reliability
 Data integrity
 Security
 Heterogeneity
File Models
 Unstructured and structured files
 Unstructured – Unix, DOS
 Structured – Ordered sequence of records
 Mutable and immutable files
Unstructured and Structured files

 Structured file model


 A file appears to the file server as an ordered
sequence of records.

 Structured files are of two types


 Files with Non-indexed records
 Files with indexed records

 Unstructured file model


 Sharing of files by different applications is easier.
Mutable and immutable files
 Mutable file model
 In this model, an update performed on a file
overwrites on its old contents to produce the new
contents.
 File is represented as a single stored sequence
that is altered by each update operation.

 Immutable file model


 In this model, a file cannot be modified once it has
been created except to be deleted.
 File versioning approach is used.
File-accessing models
 Method used for accessing remote files

 Unit of data access


Accessing remote files
 Remote service model
 Processing of the client‟s request is performed at the
server‟s node.
 Every remote file access request results in network traffic.

 Data-caching model
 It gives advantage of the locality feature found in file
accesses.
 Data is copied from the server‟s node to the client‟s node
and is cached on the client‟s node.
 Cache consistency problem.
Unit of data transfer
1. File-level transfer model: AFS-2, Amoeba
2. Block-level transfer model: Sun Microsystems
NFS
3. Byte-level transfer model: Cambridge File
System
4. Record-level transfer model: Structured Files
File-sharing semantics
 Unix semantics
 It enforces an absolute time ordering on all
operations.
 Session semantics
 All changes made to a file during a session are
initially made visible only to the client process that
opened the session &
 Invisible to other remote processes who have the
same file open simultaneously.
 Immutable shared-files semantics

 Transaction-like semantics
 It is based on the transaction mechanism, which is
a high-level mechanism for controlling concurrent
access to shared, mutable data.
File-caching schemes
 Several key decisions, such as:
 Granularity of cached data
 Cache size
 Replacement policy
 Cache location
 Modification propagation
 Cache validation
Cache location

 Place where the cached data is stored.


 Possible cache locations in DFS are:
 Server‟s main memory
 Client‟s disk (diskless workstation ?)
 Client‟s main memory
Modification propagation
 When client caching is used, multiple copies of
the same data at many nodes require to be
consistent.

 Issues:
 When to propagate modifications made to a cached
data to the corresponding file server?

 How to verify the validity of cached data?


Modification Propagation Schemes
 Write-through scheme
 Unix Like Semantics: master copy always remains
updated

 Delayed-write scheme
 Write on Ejection from Cache
 Periodic Write
 Write on Close: like session semantics
Delayed-write Scheme
 When a cache entry is modified, the new value
is written only to the cache and the client just
makes a note that the cache entry has been
updated.
Cache validation schemes
 Client-initiated approach
 Checking before every access: Unix like semantic
 Periodic checking
 Check on file open: Session semantic
Cont…
 Server-initiated approach
 A client informs the file server when opening a file,
indicating whether the file is being opened for reading,
writing, or both.

 The server keeps monitoring the file usage modes by


keeping a record of which client has which file open
and in what mode.
 Simultaneous read access allowed, but R & W not
allowed.
Drawback of Server-initiated approach
 Violates principle of client-server

 Requires stateful servers


Callback Policy
 Server keeps record of client who have
cached the file.
 Cached entry assumed to be valid unless
notified by server.
File replication
 Replication and Caching
 A replica is associated with a server, whereas a
cached copy is normally associated with a client.

 The existence of a cached copy is primarily


dependent on the locality in the access patterns,
whereas the existence of a replica normally
depends on availability and performance
requirements.
File replication
 As compared to a cached copy, a replica is more
persistent, widely known, secure, available,
complete, and accurate.

 A cached copy is contingent upon a replica. Only


by periodic revalidation with respect to a replica
can a cached copy be useful.
File Replication Advantages
1. Increased availability

2. Increased reliability

3. Improved response time

4. Reduced network traffic

5. Improved system throughput

6. Better scalability

7. Autonomous operation
File replication
 Replication transparency
• Naming of replicas

• Replication control
1. Explicit replication

2. Implicit / lazy replication


Multicopy Update Problem
 Commonly used approaches to handle this:
1. Read-only replication

2. Read-any-write-all protocol

3. Available-copies protocol

4. Primary-copy protocol

5. Quorum-based protocols
Multicopy Update Problem

 Read-only replication
 Replication of only Immutable Files

 Read-any-write-all protocol
 For Mutable files
 Unix like semantic
 Lock all copy and update
 Available-copies protocol
 Update available copies (some server may be
down)

 Primary-copy protocol
 One copy designated as Primary copy, rest
secondary
 Write operation only on primary copy
 Secondary updated by push / pull, Unix sem
or Lazily
Quorum-based protocol

 Handles network partition


 Let there be „n‟ copies of file F
 Read Op – Min „r‟ copies of F are consulted
 Write Op – Min „w‟ copies of F written
 r+w>n
 Guarantees atleast one up-to-date copy
 Associate Ver no. with copy
 Copy with highest Ver no. most recent /
updated
Special Cases of Quorum Protocol

 Read Any Write All


 Suitable when ratio of Read to Write is large
 Read All Write Any
 Suitable when ratio of Write to Read is large
 Majority Consensus Protocol
 When ratio of Rd. to Wr. Is 1
 Consensus with Weighted Voting
 Giving higher weightage to frequently accessed
copy
Fault tolerance
 Properties
1. Availability

2. Robustness

3. Recoverability
Fault tolerance
 Storage
1. Volatile storage

2. Nonvolatile storage

3. Stable storage
Fault tolerance & Service paradigm

 Stateful file servers


 Maintains state information pertaining to
service Request during file open & close
operation called Session.
 Stateless file servers
Atomic transactions
Essential properties of transactions:

1. Atomicity
 Failure atomicity / all-or-nothing property

 Concurrency atomicity / consistency property

2. Serializability / isolation property

3. Permanence / durability
Need for transactions in a file service

 For improving the recoverability of files in the event


of failures.

 For allowing the concurrent sharing of mutable files


by multiple clients in a consistent manner.

 Inconsistency may be due to:


 System failure or

 Concurrent access
Operations for transactions-based file service
1. begin_transaction
2. end_transaction
3. abort_transaction
Recovery Techniques
 File versions approach
 Avoid overwriting of actual data in physical storage
 When a transaction begins, the server creates a tentative
version from current version for write operation.
 When transaction is committed, tentative version is made
the new current version and
 Previous current version added to sequence of old version
 Serializability Conflict – when merging various tentative
versions.
 When 2 or more transactions are allowed to access same data
item & one or more of these is Wr Op.
Recovery Techniques
 Shadow blocks technique for implementing file versions
 Shadow blocks technique for implementing file versions
is used as an optimization that allows creation of a
tentative version of a file without the need to copy the
full file. In fact, it removes most of the copying.
 Here entire dist space partitioned into blocks.
 File system maintains index for each file and list of free
blocks.
 Tentative Ver. of a file is created by copying index of
current Ver. of the file
Recovery Techniques
 The write-ahead log approach
 Write-ahead log maintained on a stable storage.
 A record is first created and written to a Log.
 After this, operation is performed on file to modify
its contents.
 It is used for recording file updates in a recoverable
manner.
 A log file “write-ahead log” is used to record the
operations of a transaction that modifies the file.
Concurrency control
 Allows maximum concurrency with minimum
overhead.

 Ensures that transactions are run in a manner so


that their effects on shared data are serially
equivalent.
Cont…

 Approaches used are:


 Locking

 Optimistic concurrency control

 timestamps
Locking
 In the basic locking mechanism, a transaction locks
a data item before accessing it.

 Optimized locking for better concurrency


 Type-specific locking
 Intention-to-write locks
 Read, i-write & commit lock
 (during i-write lock read opn is permitted whereas during
commit, it is not)

 Two phase locking protocol


1. Growing phase
2. Shrinking phase
Locking

 Granularity of locking - the unit of lockable data


items.

 Handling of locking deadlocks


 Avoidance
 Detection
 Timeouts
Optimistic Concurrency Control

 Transactions are allowed to proceed uncontrolled


up to the end of the first phase.

 In the second phase, before a transaction is


committed, the transaction is validated to see if
any of its data items have been changed by any
other transaction since it started.

 The transaction is committed if found valid;


otherwise it is aborted.
Contd..
 For validation process, two records are kept
of the data items within a transaction:
 read set
 write set

 To validate a transaction, read and write sets


are compared with write sets of all the
concurrent transactions that reached at the
end of first phase.
Contd..

 If any data item present in the read set or


write set of the transaction being validated is
also present in the write set of any concurrent
transaction, the validation fails.
Cont…
 Advantages:
 Maximum parallelism

 Free from deadlock

 Drawbacks:
 Old versions of files are required to be retained
for validation process.
 Starvation of a transactions.

 In an overloaded system, number of transactions


getting aborted may go up substantially.
Timestamps
 Detect Conflict right when operation causing it is
executed.
 Each operation in a transaction is validated when
it is carried out.
 It the validation fails, the transaction is aborted
immediately and it can then be restarted.
 Each transaction is assigned a unique timestamp
at the moment it does begin_transaction.
 Every data item has a read timestamp and write
timestamp.
Contd..

 When a transaction accesses a data item, depending


on the type of access (read/write), the data item
timestamp is updated to the transaction‟s timestamp.

 The write operations of transactions are recorded


tentatively and are invisible to other transactions until
the transaction commits.
Validation of Write Operation

 If the timestamp of current transaction is either


equal to or more recent than the read and
(committed) write timestamps of accessed
data item, the write operation passes a
validation check.
 If the timestamp of current transaction is older
than the timestamp of the last read or
committed data item, the validation fails.
Validation of Read Operation

 If the timestamp of current transaction is more


recent than the write timestamp of all
committed and tentative values of the
accessed data item, the read operation passes
validation check.
 The read operation can be performed
immediately only if there are no tentative
values of the data item; otherwise it must wait
until the completion of the transactions having
tentative values of the data item.
Contd..

 The validation check fails and the current


transaction is aborted in the following cases:
 The timestamp of the current transaction is older
than the timestamp of the most recent
(committed) write to the data item.
 The timestamp of the current transaction is older
than that of tentative value of the data item made
by another transaction, although it is more recent
than the timestamp of permanent data item.
Distributed Transaction Service
 It supports transactions involving files managed by
more than one server.
 All servers need to communicate with one another
to coordinate their actions during the processing of
the transaction.
 A simple approach is to pass client requests through
a single server that holds the relevant file.
Contd..
 A client begins the transaction by sending a
begin_transaction request to any server.
 The contacted server executes the begin_transaction
request and returns the resulting TiD to the client.
 This server becomes the coordinator for the
transaction and is responsible for aborting or
committing it and for adding other servers called
workers.
Contd..
 Workers are dynamically added to the transaction.
 The request,
Add_transaction (TID, server_id of coordinator)
Informs a server that it is involved in TID.

 When the server receives add_transaction request,


 It records the server identifier of coordinator.
 Makes a new transaction record containing the TID.
 Initializes a new log to record the updates to local files from
the transaction
 It also makes a call to the coordinator to inform it of its
intention to join the transaction.
Contd..
 Hence, each worker comes to know about the
coordinator and the coordinator comes to know
about and keeps a list of all the workers involved in
the transaction.
 This information enables the workers and
coordinator to coordinate with each other at commit
time.
Two-Phase Multiserver Commit Protocol
 When the client makes an end_transaction request,
the co-ordinator and the workers in the transaction
have tentative values in logs.
 The co-ordinator decides whether the transaction
should be aborted or committed.
 Hence, end_transaction is performed in two phases:
 Preparation Phase &
 Commit Phase
Preparation Phase

 The coordinator makes an entry in its log that it is


starting the commit protocol.
 It then sends a prepare message to all the workers
telling them to prepare to commit. The message has
a time-out value associated with it.
 When a worker gets a message, it checks to see if it
is ready to commit.
 If so, makes an entry in its log and replies with a
ready message, else replies with abort message.
Commit Phase

 If all the workers are ready to commit, the


transaction is committed.

 Coordinator makes an entry in its log indicating that


the transaction has been committed.

 It then sends a commit message to the workers


asking them to commit.

 At this point, the transaction is effectively completed,


so the coordinator can report success to the client.
Contd..

 If any of the replies was abort or the prepare message


of worker got timed out, the transaction is aborted.
 Coordinator makes an entry in its log indicating that
the transaction has been aborted.
 It then sends an abort message to the workers asking
them to abort and reports failure to the client.
Contd..
 When a worker receives the commit message, it makes
a committed entry in its log and sends a committed
reply to the coordinator.
 Part of the transaction with worker is treated as
completed & its record with worker are erased.
 When the coordinator has received a committed reply
from all the workers, the transaction is considered
complete and all its records maintained by the
coordinators are erased.
 Coordinator keeps resending the commit message until
it receives the committed reply from all the workers.
Nested Transactions
 Nested transactions are a generalization of
traditional transactions in which a transaction may
be composed of sub-transactions.

 A sub-transaction may in turn have its own sub-


transactions.

 In this way, transactions can be nested forming a


family of transactions.
Committing of Nested Transactions

 In a nested transaction, a transaction may


commit only after all its descendants have
committed.
 A transaction may abort at any time.
 Hence, in order to commit the whole
transaction, its top level transaction must wait
for other transactions in the family to commit.
Contd..

 Advantages of nested transactions


 It allows concurrency within a transaction.
 It provides greater protection against failures.
Design Principles for Distributed File
System
1. Clients have cycles to burn.
 Preferably perform an operation on client‟s machine rather
than performing it on a server machine.

2. Cache whenever possible.


 Caching of data at client‟s sites frequently improves overall
system performance because it makes data available.

3. Exploit usage properties.


 Depending upon the usage properties (access and
modifications), files should be grouped into small number
of identifiable classes.
Contd..

4. Minimize system wide knowledge and change.


 Monitoring or automatically updating of global information
is avoided.

5. Trust the fewest possible entities.


 To ensure security based on the integrity of much smaller
number of servers rather than trusting thousand of clients.

6. Batch if possible
 Grouping operations together can improve throughput.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy