0% found this document useful (0 votes)
117 views37 pages

Other File Systems: LFS, NFS, and Afs

The document discusses several file systems including log-structured file systems (LFS), Network File System (NFS), and Andrew File System (AFS). LFS writes all pending writes in memory to disk in segments for better bandwidth utilization. NFS allows sharing files across networked computers in a transparent way using client-server model. AFS provides location-independent sharing of files using whole-file caching at clients and write-back of modified files to servers on close.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views37 pages

Other File Systems: LFS, NFS, and Afs

The document discusses several file systems including log-structured file systems (LFS), Network File System (NFS), and Andrew File System (AFS). LFS writes all pending writes in memory to disk in segments for better bandwidth utilization. NFS allows sharing files across networked computers in a transparent way using client-server model. AFS provides location-independent sharing of files using whole-file caching at clients and write-back of modified files to servers on close.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Other File Systems:

LFS, NFS, and AFS

Goals for Today


Discuss specific file systems
both local and remote

Log-structured file system (LFS)


Distributed file systems (DFS)
Network file system (NFS)
Andrew file system (AFS)

Log-Structured File Systems


The trend: CPUs are faster, RAM & caches are bigger

So, a lot of reads do not require disk access


Most disk accesses are writes pre-fetching not very useful
Worse, most writes are small 10 ms overhead for 50 s write
Example: to create a new file:

i-node of directory needs to be written


Directory block needs to be written
i-node for the file has to be written
Need to write the file

Delaying these writes could hamper consistency

Solution: LFS to utilize full disk bandwidth


3

LFS Basic Idea


Structure the disk a log
Periodically, all pending writes buffered in memory are collected
in a single segment
The entire segment is written contiguously at end of the log

Segment may contain i-nodes, directory entries, data


Start of each segment has a summary
If segment around 1 MB, then full disk bandwidth can be utilized

Note, i-nodes are now scattered on disk


Maintain i-node map (entry i points to i-node i on disk)
Part of it is cached, reducing the delay in accessing i-node

This description works great for disks of infinite size


4

LFS vs. UFS


file2

file1

inode
directory

dir1

dir2

Unix File
System

inode map

dir2

dir1

Log
file1

data

file2

Log-Structured
File System

Blocks written to
create two 1-block
files: dir1/file1 and
dir2/file2, in UFS and
LFS

LFS Cleaning
Finite disk space implies that the disk is eventually full
Fortunately, some segments have stale information
A file overwrite causes i-node to point to new blocks
Old ones still occupy space

Solution: LFS Cleaner thread compacts the log


Read segment summary, and see if contents are current
File blocks, i-nodes, etc.

If not, the segment is marked free, and cleaner moves forward


Else, cleaner writes content into new segment at end of the log
The segment is marked as free!

Disk is a circular buffer, writer adds contents to the front,


cleaner cleans content from the back
6

Distributed File Systems


Goal: view a distributed system as a file system
Storage is distributed
Web tries to make world a collection of hyperlinked documents

Issues not common to usual file systems

Naming transparency
Load balancing
Scalability
Location and network transparency
Fault tolerance

We will look at some of these today

Transfer Model
Upload/download Model:
Client downloads file, works on it, and writes it back on server
Simple and good performance

Remote Access Model:


File only on server; client sends commands to get work done

Naming transparency
Naming is a mapping from logical to physical objects
Ideally client interface should be transparent
Not distinguish between remote and local files
/machine/path or mounting remote FS in local hierarchy are not
transparent

A transparent DFS hides the location of files in system


2 forms of transparency:
Location transparency: path gives no hint of file location
/server1/dir1/dir2/x tells x is on server1, but not where server1 is

Location independence: move files without changing names


Separate naming hierarchy from storage devices hierarchy
9

File Sharing Semantics


Sequential consistency: reads see previous writes
Ordering on all system calls seen by all processors
Maintained in single processor systems
Can be achieved in DFS with one file server and no caching

10

Caching
Keep repeatedly accessed blocks in cache
Improves performance of further accesses

How it works:

If needed block not in cache, it is fetched and cached


Accesses performed on local copy
One master file copy on server, other copies distributed in DFS
Cache consistency problem: how to keep cached copy
consistent with master file copy

Where to cache?
Disk: Pros: more reliable, data present locally on recovery
Memory: Pros: diskless workstations, quicker data access,
Servers maintain cache in memory
11

File Sharing Semantics


Other approaches:
Write through caches:
immediately propagate changes in cache files to server
Reliable but poor performance

Delayed write:

Writes are not propagated immediately, probably on file close


Session semantics (AFS): write file back on close
Alternative (NFS): scan cache periodically and flush modified blocks
Better performance but poor reliability

File Locking:
The upload/download model locks a downloaded file
Other processes wait for file lock to be released
12

Network File System (NFS)


Developed by Sun Microsystems in 1984
Used to join FSes on multiple computers as one logical whole

Used commonly today with UNIX systems


Assumptions
Allows arbitrary collection of users to share a file system
Clients and servers might be on different LANs
Machines can be clients and servers at the same time

Architecture:
A server exports one or more of its directories to remote clients
Clients access exported directories by mounting them
The contents are then accessed as if they were local
13

Example

14

NFS Mount Protocol


Client sends path name to server with request to mount
Not required to specify where to mount

If path is legal and exported, server returns file handle


Contains FS type, disk, i-node number of directory, security info
Subsequent accesses from client use file handle

Mount can be either at boot or automount


Using automount, directories are not mounted during boot
OS sends a message to servers on first remote file access
Automount is helpful since remote dir might not be used at all

Mount only affects the client view!


15

NFS Protocol
Supports directory and file access via remote procedure
calls (RPCs)
All UNIX system calls supported other than open & close
Open and close are intentionally not supported

For a read, client sends lookup message to server


Server looks up file and returns handle
Unlike open, lookup does not copy info in internal system tables
Subsequently, read contains file handle, offset and num bytes
Each message is self-contained

Pros: server is stateless, i.e. no state about open files


Cons: Locking is difficult, no concurrency control
16

NFS Implementation
Three main layers:
System call layer:
Handles calls like open, read and close

Virtual File System Layer:


Maintains table with one entry (v-node) for each open file
v-nodes indicate if file is local or remote
If remote it has enough info to access them
For local files, FS and i-node are recorded

NFS Service Layer:


This lowest layer implements the NFS protocol

17

NFS Layer Structure

18

How NFS works?


Mount:

Sys ad calls mount program with remote dir, local dir


Mount program parses for name of NFS server
Contacts server asking for file handle for remote dir
If directory exists for remote mounting, server returns handle
Client kernel constructs v-node for remote dir
Asks NFS client code to construct r-node for file handle

Open:
Kernel realizes that file is on remotely mounted directory
Finds r-node in v-node for the directory
NFS client code then opens file, enters r-node for file in VFS, and
returns file descriptor for remote node
19

Cache coherency
Clients cache file attributes and data
If two clients cache the same data, cache coherency is lost

Solutions:
Each cache block has a timer (3 sec for data, 30 sec for dir)
Entry is discarded when timer expires

On open of cached file, its last modify time on server is checked


If cached copy is old, it is discarded

Every 30 sec, cache time expires


All dirty blocks are written back to the server

20

Andrew File System (AFS)


Named after Andrew Carnegie and Andrew Mellon
Transarc Corp. and then IBM took development of AFS
In 2000 IBM made OpenAFS available as open source

Features:

Uniform name space


Location independent file sharing
Client side caching with cache consistency
Secure authentication via Kerberos
Server-side caching in form of replicas
High availability through automatic switchover of replicas
Scalability to span 5000 workstations
21

AFS Overview
Based on the upload/download model
Clients download and cache files
Server keeps track of clients that cache the file
Clients upload files at end of session

Whole file caching is central idea behind AFS


Later amended to block operations
Simple, effective

AFS servers are stateful


Keep track of clients that have cached files
Recall files that have been modified

22

AFS Details
Has dedicated server machines
Clients have partitioned name space:
Local name space and shared name space
Cluster of dedicated servers (Vice) present shared name space
Clients run Virtue protocol to communicate with Vice

Clients and servers are grouped into clusters


Clusters connected through the WAN

Other issues:
Scalability, client mobility, security, protection, heterogeneity

23

AFS: Shared Name Space


AFSs storage is arranged in volumes
Usually associated with files of a particular client

AFS dir entry maps vice files/dirs to a 96-bit fid


Volume number
Vnode number: index into i-node array of a volume
Uniquifier: allows reuse of vnode numbers

Fids are location transparent


File movements do not invalidate fids

Location information kept in volume-location database


Volumes migrated to balance available disk space, utilization
Volume movement is atomic; operation aborted on server crash
24

AFS: Operations and Consistency


AFS caches entire files from servers
Client interacts with servers only during open and close

OS on client intercepts calls, and passes it to Venus


Venus is a client process that caches files from servers
Venus contacts Vice only on open and close
Does not contact if file is already in the cache, and not
invalidated
Reads and writes bypass Venus

Works due to callback:


Server updates state to record caching
Server notifies client before allowing another client to modify
Clients lose their callback when someone writes the file

Venus caches dirs and symbolic links for path translation


25

AFS Implementation
Client cache is a local directory on UNIX FS
Venus and server processes access file directly by UNIX i-node

Venus has 2 caches, one for status & one for data
Uses LRU to keep them bounded in size

26

Summary
LFS:
Local file system
Optimize writes

NFS:
Simple distributed file system protocol. No open/close
Stateless server
Has problems with cache consistency, locking protocol

AFS:
More complicated distributed file system protocol
Stateful server
session semantics: consistency on close
27

Enjoy Spring Break!!!

28

Storage Area Networks (SANs)


New generation of architectures for managing storage in
massive data centers
For example, Google is said to have 50,000-200,000 computers
in various centers
Amazon is reaching a similar scale

A SAN system is a collection of file systems with tools to


help humans administer the system

29

Examples of SAN issues


Where should a file be stored
Many of these systems have an indirection mechanism so that a
file can move from volume to volume
Allows files to migrate, e.g. from a slow server to a fast one or
from long term storage onto an active disk system

Eco-computing: systems that seek to minimize energy in


big data centers

30

Examples of SAN issues


Disk-to-disk backup
Might want to do very fast automated backups
Ideally, can support this while the disk is actively in use

Easiest if two disks are next to each other


Challenge: back up entire data center in New York at site
in Kentucky
US Dept of Treasury e-Cavern

31

File System Reliability


2 considerations: backups and consistency
Why backup?
Recover from disaster
Recover from stupidity

Where to backup? Tertiary storage


Tape: holds 10 or 100s of GBs, costs pennies/GB
sequential access high random access time

Backup takes time and space

32

Backup Issues
Should the entire FS be backup up?
Binaries, special I/O files usually not backed up

Do not backup unmodified files since last backup


Incremental dumps: complete per month, modified files daily

Compress data before writing to tape


How to backup an active FS?
Not acceptable to take system offline during backup hours

Security of backup media

33

Backup Strategies
Physical Dump
Start from block 0 of disk, write all blocks in order, stop after last
Pros: Simple to implement, speed
Cons: skip directories, incremental dumps, restore some file
No point dumping unused blocks, avoiding it is a big overhead
How to dump bad blocks?

Logical Dump

Start at a directory
dump all directories and files changed since base date
Base date could be of last incremental dump, last full dump, etc.
Also dump all dirs (even unmodified) in path to a modified file
34

Logical Dumps
Why dump unmodified directories?
Restore files on a fresh FS
To incrementally recover a single file

File that has


not changed

35

A Dumping Algorithm

Algorithm:

Mark all dirs & modified files


Unmark dirs with no mod. files
Dump dirs
Dump modified files

36

Logical Dumping Issues

Reconstruct the free block list on restore


Maintaining consistency across symbolic links
UNIX files with holes
Should never dump special files, e.g. named pipes

37

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy