0% found this document useful (0 votes)

23 views51 pages

File Systems2023Part1

Uploaded by

Tuấn Hiệp Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views51 pages

File Systems2023Part1

Uploaded by

Tuấn Hiệp Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

File systems

• PART 1:
• Secondary storage devices
• Types of storage devices
• Disk structure
• Disk formatting
• Files
• File systems
• Definition of “file system”
• Address mapping
• Strategies for allocating disk space to files
• PART 2:
• Windows file systems
• FAT 12,16, 32
• NTFS
• Linux file systems
• Linux file structure on disk
• ext2
• Mounting a file system in Linux and Windows
• The boot sequence

• This section of the course is based in part on chapters 11, 13, 14

and 15 in the book Operating System Concepts, tenth ed., and
chapter 4 in the book Modern Operating Systems, third ed.
Computer System Hardware
• Computer-system hardware
• One or more CPUs, device controllers connect through
common bus providing access to the memory and
other I/O devices
Storage-device hierarchy
Types of secondary storage devices

• Hard Disk Drives (HDDs)

• Solid State Drives (SSDs)
Hard disk drives

• An HDD is a set of
spinning platters of
magnetically-coated
material under moving
read-write heads
• Components:
• Platters
• arms
• read-write heads
• tracks
• cylinders
• sectors
Hard disk drives

• The platters are double-sided

circular trays, often in aluminum,
covered by a magnetic layer where
data is stocked.
• The platters are moved by a
spindle motor
• Like the platters, the read-write
heads are another mobile
component. They are steered by
the arm assembly which is driven
by a second motor.
• The arm assembly moves the
read-write head on a particular
track, while the spindle rotates
the platters such that the
sector to be read is under the
head
• Heads move all together as well as
platters.
• Important, the read-write heads
Hard disk drives

• The cylinder refers to the

position of a given track
on all the platters
• A sector is a section of a
track.
• Each read-write action
touch at least one entire
sector for each action,
i.e., it is the minimum
number of bits copied or
read.
Characteristics impacting performance

• Transfer rate: the rate

at which data flow
between drive and
computer
• Positioning time
(random-access time):
• time to move disk
arm to desired
cylinder (seek time)
and
• time for desired
sector to rotate
under the disk head
(rotational latency)
Hard Disk Drives characteristics

• Platters rotate at 60 to 250 times

per second
• Platters range from .85” to 14”
(historically)
• Commonly 3.5”, 2.5”, and 1.8”
• Storage from 30GB to 18TB per drive
• Performance
• Transfer Rate – theoretical – 6
Gb/sec
• Effective Transfer Rate – real –
1Gb/sec
• Seek time from 3ms to 12ms – 9ms
common for desktop drives
• Average seek time measured or
calculated based on 1/3 of tracks
• Latency based on spindle speed
• 1 / (RPM / 60) = 60 / RPM
• Average latency = ½ latency
Factors impacting performance of HDDs

• Read/write data is a three-stage

process:
• Seek time: position the head/arm
over the proper track
• Rotational latency: wait for desired
sector to rotate under r/w head
• Transfer time: transfer a block of
bits (sector) under r/w head

Request Time = Queueing Time + Controller Time + Seek +

Rotational + Transfer
Request

Result
Controller
Software Hardware Media Time
Queue
(Seek+Rot+Xfer)
(Device Driver)
Example of current HDDs
• Seagate Exos X18 (2020)
• 18 TB hard disk
• 9 platters, 18 heads
• Helium filled: reduce friction and power
• 4.16ms average seek time
• 4096 byte physical sectors
• 7200 RPMs
• Dual 6 Gbps SATA /12Gbps SAS interface
• 270MB/s MAX transfer rate
• Cache size: 256MB
• Price: $ 562 (~ $0.03/GB)

• IBM Personal Computer/AT (1986)

• 30 MB hard disk
• 30-40ms seek time
• 0.7-1 MB/s (est.)
• Price: $500 ($17K/GB, 340,000x more expensive !!)
Solid State Drive
• Semiconductor-based storage device, data
are stored in gates, called floating gate
transistor.
• This is a similar technology as main memory,
except it doesn’t erase when system is turned
off
• Each floating gate transistor usually stores
1 bit.
• A SSD consists of an array of blocks, and
within each block, there is an array of
memory cells, known as pages (4KB).
• An SSD does not have a mechanical arm to
read and write data, rather an embedded
processor (called controller) performs
operations related to reading and writing
data.

NAND block with valid and invalid pages

Solid-state disk
• Can be more reliable than HDDs
• More expensive per MB
• May have a shorter life span –
need careful management
• But much faster
• Busses can be too slow ->
connect directly to PCI for
example

No moving parts, so no seek time or rotational latency, thus

SSDs are typically more resistant to physical shock, run
silently, and have higher input/output rates and lower
latency
Reading/writing is an SSD
• Data are not read or written one cell at
a time but rather at the page level, i.e.
when the controller reads it reads one
complete page (this is similar to
reading one sector in HDD)
• Writing is more complex.
• The OS cannot write into a page that
has been written before, writing must
be performed on a clean page.
• Thus, when changes must be made to a NAND block with valid and invalid pages
page, the current page is marked
invalid, the new content is written on a
clean page.
Erasing blocks is an SSD

• Blocks end up with mix of valid and

invalid pages
• Invalid pages must eventually be
erased.
• However, erasing cannot be done at the
page level, it must be done at the block
level.
• Thus, to erase a block NAND block with valid and invalid pages
• the contents of the entire block must be
copied into main memory,
• the block is erased,
• and then the content of the valid pages
in the old block is written back to the
newly erased block
Solid State disks limitations
• SSD is not as fast as main memory but
it is much faster than HDD
• However, SSD has some storage and
reliability challenges
• Read and write on “pages” (like for
sectors) but can’t overwrite a page
• Must first be erased, and erases can
only be executed at the ”block” level
• Can only be erased a limited number of
times before wearing out – ~ 100,000
• Life span measured in drive writes per
day (DWPD)
• A 1TB NAND drive with a rating of 5DWPD
is expected to have 5TB per day written
within the warranty period without failing
Some “current” (large) 3.5in SSDs

• Seagate Exos SSD: 15.36TB (2017)

• Seq reads 860MB/s
• Seq writes 920MB/s
• Price (Amazon): $5495 ($0.36/GB)

• Nimbus SSD: 100TB (2019)

• Seq reads/writes: 500MB/s
• Random Read Ops (IOPS): 100K
• Unlimited writes for 5 years!
• Price: ~ $40K? ($0.4/GB)
• 50TB drive costs $12500 ($0.25/GB)
HDD vs. SSD Comparison

HDD SDD
Require seek + rotation No seeks
Not parallel (one head) Parallel
Brittle (moving parts) No moving parts
Random reads take 10s Random reads take 10s
milliseconds microseconds
Slow (Mechanical) Wears out
Cheap/large storage Expensive/smaller storage
Storing information
• Applications can store information in a process address space
• This is a bad idea, why?
• Storage size is limited to the size of the virtual address space
• May not be sufficient for large applications such as banking, etc.
• Data of the application is lost when the process exits or when computer crashes
• Multiple processes might want to access the same data, but couldn’t
• Rather we want to be able to store very large amount of data; which
survive processes; and be able to have concurrent access to it by
multiple processes
• Solution:
• Store information on disks in units called files
• Files are persistent, and only owner can explicitly delete them
• Files are managed by the OS, HOW? The File System, which is how the OS manages
files
Files
• Files are logical units of information created by processes.
• When a process creates a file, it gives the file a name.
• When the process terminates, the file continues to exist and
can be accessed by other processes using its name.
• So, files provide a way to store information on the disk and
read it back later.
File naming
• The exact rules for file naming vary among file systems,
but all current file systems allow at least strings of one to
eight characters as legal file names.
• andrea, bruce, and cathy are possible file names.
• Digits and special characters are also permitted, like 2,
urgent!, and Fig.2-14
• Many file systems support names as long as 255
characters.
• Some file systems distinguish between upper- and
lowercase letters (UNIX), whereas others do not (MS-DOS)
File extensions
• Many file systems support two-part file names, with the two
parts separated by a period, as in prog.c.
• The part following the period is called the file extension and
usually indicates something about the file.
• In MS-DOS, for example, file names are 1 to 8 characters, plus
an optional extension of 1 to 3 characters.
• In UNIX, the size of the extension, if any, is up to the user, and
a file may even have two or more extensions:
homepage.html.zip
• UNIX file extensions are just conventions and are not enforced
by the operating system.
• However C compiler might insist on its extensions • They are
useful for C
• Windows is aware of the extensions and assigns meaning to
them. When a user double clicks on a file name, the program
assigned to its file extension is launched with the file as a
parameter.
•
Typical file extensions

.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
File access
• Sequential access:
• read all bytes/records from the beginning
• cannot jump around, could rewind or back up
• convenient when medium was tapes
• Random access:
• bytes/records read in any order
• read can be …
• move file marker (seek), then read or …
• read and then move file marker
File attributes
• Every file has a name and its
data.
• In addition, all file systems
associate other information
with each file, the date and
time the file was last
modified and the file’s size.
• These extra items are the
file’s attributes or metadata.
• The list of attributes varies
considerably from system to
system
Basic file operations
• Create a file
• Write to a file
• Read from a file
• Seek to somewhere in a file
• Delete a file
• Truncate a file
• Rename a file
• Append to a file
Directories
• To keep track of files, file systems normally have directories,
which are themselves files.
• A directory is a file with a special structure
• Two types of directories: Single-Level Directory and
Hierarchical directory
• Single-Level Directory Systems: one directory containing all
the files
• Was common in early personal computers
• Pros: simplicity, ability to quickly locate files
• Cons: inconvenient naming (uniqueness, all files must have
different names)
Hierarchical directory systems
• It is a tree where leaves are data-files and internal nodes are
directory-files
• Each directory-file entry points a mix of data-files and
subdirectories
• The tree has a root node which is a directory, the root
directory
• Solve name collisions, a file name is the whole path
Path names
• When the file system is based on a directory tree, a file is
identified using its name and a path in the tree.
• Two different methods:
• Absolute path name consisting of the path from the root directory
to the file. As an example, the path /usr/ast/mailbox
• Relative path name. In conjunction with working directory (also
called the current directory). Path names beginning with the
working directory
File systems implementations
File system implementation
• The most important task of a file system is the
allocation of hard disk storage to files that are
created by processes.
• Lesser but still important tasks are to implement
the other operations on files such as Write, Read,
Delete, Rename a file, etc.

• Before creating a file system, the hard disk must be

formatted
Disk formatting
• A new storage device is a blank slate: it
is just a platter of a magnetic recording
material (HDD) or a set of uninitialized
semiconductor storage cells (SSD)
• Before a storage device can store data,
it must be divided into sectors that the
controller can read and write, this is
called low-level formatting
• Low-level formatting, or physical
formatting — Dividing a disk into
sectors that the disk controller can
read and write
• Each sector can hold header information,
plus data, plus error correction code
(ECC)
• Usually 512 bytes or 4KB
• Most drives are low-level-formatted
at the factory as a part of the
manufacturing process.
Disk formatting: partition

• The next step is disk partition .

• Partition: the disk is partitioned
into one or more groups of
cylinders, each treated as a
logical disk
• For instance, one partition can
hold a file system containing a
copy of the operating system’s
executable code, another the
swap space, and another a file
system containing the user files
• The partition information is
written in a fixed format at a
fixed location on the storage
device
Creation of the file system
• The last step is logical formatting or creation of a file system .
• As a partition is a contiguous set of blocks on a drive that is treated as
an independent disk, there must be a file system for each partition
• In this step, the operating system stores the initial file-system data structures,
including:
• Superblock which contains a magic number to identify the file-system type, the number of
blocks in the file system, and other key administrative information
• Bitmap of free blocks in the file system
• i-nodes / FAT which is used to allocate HDD storage to files
• Creation of the root directory of the file system
Free blocks
• After the superblock comes information about free blocks in
the file system, in the form of a bitmap or a list of pointers.
• Before, we must understand how HDD storage is represented
in the OS
OS representation of HDD storage

• The OS represents the hard disk drive as a

logical sequence of “blocks”
• LBA is a scheme to specify the location of
each block of data on the HDD
• Indexes in a one-dimensional array are
converted by the disk controller into
cylinder, head, sector (CHS) values
indicating where the block at index ‘i’ is
stored on the HDD
‘’
• A block is the OS representation of the
smallest logical data unit. Data are
transferred in block units
• The smallest physical unit on disk drives
is the sector
• Sectors usually have space for 512 bytes
to 1-KB
• A block is one or a sequence of sectors.
``
Keeping track of free blocks

• 1- use a linked list of disk

blocks, each block holding
the address of free disk
block numbers:
• With a 1-KB block and a
32-bit disk block number,
each block on the free list
holds the numbers of 255
free blocks. (One slot is
required for the pointer to
the next block.)
• A 1-TB disk (1 billion
blocks) requires about 4
million blocks to store the
list of free blocks
Keeping track of free blocks

• 2- Use a bitmap. A
disk with n blocks
requires a bitmap
with n bits.
• Free blocks are
represented by 1s in
the map, allocated
blocks by 0s
• 1-TB disk, we need 1
billion bits for the
map, which requires
around 130,000 1-KB
blocks to store
Allocation of blocks to files
• After the superblock and the management of free blocks in the
file system, come strategies to allocate free blocks to files.
Examples:
• i-nodes (Unix), an array of data structures, one per file,
telling all about the file
• FAT (MS-DOS, early Windows versions), file allocation table
• NTFS (Windows)
• the root directory, which contains the top of the file-system
tree.
• the remainder of the disk contains all the other directories
and files.
File storage allocation methods
• Files are stored in blocks (sectors) of the disk, so there must
be a method to allocate blocks to files (similar to allocate
main memory to processes)
• Allocation methods:
• Contiguous blocks
• Linked list of blocks
• Linked list using table
• I-nodes
Contiguous allocation
• The simplest allocation scheme is to store each file as a contiguous
sequence of disk blocks.
• With 1-KB blocks, a 50-KB file would be allocated 50 consecutive
blocks
• The directory entry of each file only needs to keep the address of the
first block
• Drawback: over time, the disk becomes fragmented.
• Contiguous Allocation
• At file creation time, a sequence of
free blocks is allocated, only
Contiguous allocation remember the address of the first
block
• File cannot grow beyond that size
• Fragmentation a problem
• Free list
• Allocation may be by first or best fit
• Requires periodic compaction
Linked list allocation

• Here each file is stored as a

linked list of disk blocks.
• The first word of each block
is used as a pointer to the
next one.
• The rest of the block is for
data.
• The directory entry only
need the address of the first
block
• Random access is slow, need
to chase all the pointers
Linked list allocation
using table

• Have a table in main memory of

the same size as the number of
blocks on the partition
• Store the address of the next block
in the table
• Inconvenient: Table too big, with a
1-TB partition and a 1-KB block
size, the table needs 1 billion
entries, one for each of the 1
billion disk blocks
• This “file allocation table” (FAT)
come in different forms in MS_DOS
and Windows, FAT-12, FAT-16, FAT-32
• File Allocation
Table (FAT) Directory FAT

• FAT in main memory 0

• The directory has a 0
0
pointer into the File name start 30 1
0
70

starting block entry index

/foo .
in the FAT for each 30
50
.
1 -1
file. /bar 0
50 .
• FAT becomes big, 70
.
1 -1
use too much main
memory Free/busy next
I-nodes
• Lists the attributes and disk addresses of
the file’s blocks
• There is one i-node per file, i-nodes are
not stored in main memory but rather on a
specific set of blocks on the disk, when a
file is opened, its corresponding i-node is
loaded in main memory
• i-nodes contain a fix number of disk
addresses, if file is big, one address is
kept to store the address of a block that
contain more disk address
• i-nodes is the method in Unix/Linux
operating systems, Windows has a
similar system called NTFS
• Single level
indexed (i-node)
Allocation Data blocks
Directory
• Directory entries now i-node for /foo 100

point to the i-node for File name i-node address

30
100

that file 201

201
• Since the i-node is a fixed /foo 30

size there is a maximum i-node for /bar

file size /bar 50
50 99
99
• Multilevel
Indexed
Allocation
• Make the i-node point Data blocks
to index blocks which
Directory 100
point to the files (first- i-node for /foo 1st level
level indirection) 30 40 100
File name i-node address 40
• May be extended to 45 201 201
two-level (and beyond) /foo 30
indirection
• Problem: Accessing
even a small file 45
299
299
requires a lot of
indirection
• Hybrid Indexed
Allocation
• Two direct
pointers for
small files
• One single
indirect
• One double
indirect
• One triple
indirect

• ls –i list the inode of each file

• df –I how many inodes are free
and left unused in the file
system
Hybrid indexed allocation for a directory

Data blocks
100

i-node for /foo

201
30 direct (100)
direct (201)
single indirect (40)
double indirect (45) 150
triple indirect
40 150
160 160
File i-node
name address
60 299
299
/foo 30 399
45 60
70 70
399