12 File Systems
12 File Systems
1 / 38
Why disks are different
CRASH!
memory disk
year
• Huge (64–1,000x bigger than memory)
- How to organize large collection of ad hoc information?
- File System: Hierarchical directories, Metadata, Search
2 / 38
Disk vs. Memory
TLC NAND
Disk Flash DRAM
Smallest write sector sector byte
Atomic write sector sector byte/word
Random read 8 ms 3-10 µs 50 ns
Random write 8 ms 9-11 µs* 50 ns
Sequential read 200 MB/s 550–2500 MB/s > 10 GB/s
Sequential write 200 MB/s 520–1500 MB/s* > 10 GB/s
Cost $0.01–0.02/GB $0.06–0.10/GB $2.50–5/GiB
Persistence Non-volatile Non-volatile Volatile
3 / 38
Disk review
• File abstraction:
- User’s view: named sequence of bytes
• File operations:
- Create a file, delete a file
- Read from file, write to file
• Want: operations to have as few disk accesses as possible &
have minimal space overhead (group related things)
6 / 38
What’s hard about grouping blocks?
7 / 38
FS vs. VM
8 / 38
Some working intuitions
• Sequential:
- File data processed in sequential order
- By far the most common mode
- Example: editor writes out new file, compiler reads in file, etc
• Random access:
- Address any block in file directly without passing through
predecessors
- Examples: data set for demand paging, databases
• Keyed access
- Search for block with particular values
- Examples: associative data base, index
- Usually not provided by OS
10 / 38
Problem: how to track file’s data
• Disk management:
- Need to keep track of where file contents are on disk
- Must be able to use this to map byte offset to disk block
- Structure tracking a file’s sectors is called an index node or inode
- Inodes must be stored on disk, too
• Things to keep in mind while designing file structure:
- Most files are small
- Much of the disk is allocated to large files
- Many of the I/O operations are made to large files
- Want good sequential and good random access
(what do these require?)
11 / 38
Straw man: contiguous allocation
12 / 38
Straw man: contiguous allocation
• Cons?
13 / 38
Straw man #2: Linked files
• Basically a linked list on disk.
- Keep a linked list of all free blocks
- Inode contents: a pointer to file’s first block
- In each block, keep a pointer to the next one
15 / 38
FAT discussion
15 / 38
Another approach: Indexed files
• Pros?
• Cons?
16 / 38
Another approach: Indexed files
• Pros?
- Both sequential and random access easy
• Cons?
- Mapping table requires large chunk of contiguous space
. . . Same problem we were trying to solve initially
16 / 38
Indexed files
17 / 38
Multi-level indexed files (old BSD FS)
18 / 38
Old BSD FS discussion
• Pros:
- Simple, easy to build, fast access to small files
- Maximum file length fixed, but large.
• Cons:
- What is the worst case # of accesses?
- What is the worst-case space overhead? (e.g., 13 block file)
• An empirical problem:
- Because you allocate blocks by taking them off unordered freelist,
metadata and data get strewn across disk
19 / 38
More about inodes
• Problem:
- “Spend all day generating data, come back the next morning, want
to use it.” – F. Corbató, on why files/dirs invented
• Approach 0: Users remember where on disk their files are
- E.g., like remembering your social security or bank account #
• Yuck. People want human digestible names
- We use directories to map names to file blocks
• Next: What is in a directory and why?
21 / 38
A short history of directories
22 / 38
Hierarchical Unix
23 / 38
Naming magic
25 / 38
Default context: working directory
26 / 38
Hard and soft links (synonyms)
• Components:
- Data blocks
- Inodes (directories represented as files)
- Hard links
- Superblock. (specifies number of blks in FS, counts of max # of
files, pointer to head of free list)
• Problem: slow
- Only gets 20Kb/sec (2% of disk maximum) even for sequential disk
transfers!
28 / 38
A plethora of performance costs
30 / 38
Solution: fragments
• BSD FFS:
- Has large block size (4096 or 8192)
- Allow large blocks to be chopped into small ones (“fragments”)
- Used for little files and pieces at the ends of files
31 / 38
Clustering related objects in FFS
32 / 38
Clustering in FFS
bookkeeping
information
- Bad: free list gets jumbled over time. Finding adjacent blocks hard
and slow
• FFS: switch to bit-map of free blocks
- 1010101111111000001111111000101100
- Easier to find contiguous blocks.
- Small, so usually keep entire thing in memory
- Time to find free block increases if fewer free blocks
35 / 38
Using a bitmap
• Performance improvements:
- Able to get 20-40% of disk bandwidth for large files
- 10-20x original Unix file system!
- Better small file performance (why?)
• Is this the best we can do? No.
• Block based rather than extent based
- Could have named contiguous blocks with single pointer and
length (Linux ext4fs, XFS)
• Writes of metadata done synchronously
- Really hurts small file performance
- Make asynchronous with write-ordering (“soft updates”) or
logging/journaling. . . more next lecture
- Play with semantics (/tmp file systems)
37 / 38
Other hacks
• Obvious:
- Big file cache
• Fact: no rotation delay if get whole track.
- How to use?
• Fact: transfer cost negligible.
- Recall: Can get 50x the data for only ∼3% more overhead
- 1 sector: 5ms + 4ms + 5µs (≈ 512 B/(100 MB/s)) ≈ 9ms
- 50 sectors: 5ms + 4ms + .25ms = 9.25ms
- How to use?
• Fact: if transfer huge, seek + rotation negligible
- LFS: Hoard data, write out MB at a time
• Next lecture:
- FFS in more detail
- More advanced, modern file systems
38 / 38