12 FileDirectory
12 FileDirectory
2024/25 COMP3230B
Contents
• What is a file?
• What is a directory?
2
Related Learning Outcome
3
Readings & References
• Required Reading
• Chapter 39 – Interlude: Files and Directories
• http://pages.cs.wisc.edu/~remzi/OSTEP/file-intro.pdf
• References
• Chapter 36 – I/O Devices
• http://pages.cs.wisc.edu/~remzi/OSTEP/file-devices.pdf
• Chapter 37 – Hard Disk Drives
• http://pages.cs.wisc.edu/~remzi/OSTEP/file-disks.pdf
• Chapter 44 – Flash-based SSDs
• http://pages.cs.wisc.edu/~remzi/OSTEP/file-ssd.pdf
• How do SSDs work?
• http://www.extremetech.com/extreme/210492-extremetech-explains-how-do-ssds-work
4
Secondary Storage
• Most secondary storage devices involve magnetic disks, which are
random-access storage
• Data can be accessed by read-write head in any order
5
Physical layout of HDD
• A disk consists of a number of
magnetic platters with recording
surfaces on both sides
• Rotate on spindle
• Each surface is divided into a
number of concentric tracks
• Each track is divided in to a number
of sectors
• Vertical sets of tracks form
cylinders
6
Physical layout of HDD
7
Performance Characteristics of HDD
• The data in a particular disk sector can be read/written
• To access a data block
• Disk arm must move to the target track; then rotate the disk to put target sector under the read-
write head; then record is read-from/write-to the disk
• Performance characteristics
• Seek time
• Time for read-write head to move to target track from current location
• average seek times is around 0.5 to 2 milliseconds
• Rotational latency
• Time for rotate the platter until the target sector is underneath read-write head
• depends on the spinning rate; roughly around 2 ms
• Transfer time
• Time for further rotate the head to read/write the entire sector and transfer the data
8
Performance Considerations of HDD
• Ways to improve disk I/O performance
• Disk scheduling
• In multiprogramming environment, multiple processes can generate I/O requests at the same time,
there may have several pending requests queued up at the disk queue
• Which request should the system do first?
• Because of the high cost of I/O, the OS historically played a role in deciding the order of I/Os issued to the disk
• To optimize the data transfer with the minimum mechanical motion - seek time and rotational time
• Caching
• A disk cache buffer (in main memory) is used to temporarily hold disk data
• Defragmentation
• Place related data in contiguous sectors
• Decreases number of seek operations required
• Multiple disks
• Disk I/O performance may be increased by spreading the operation over multiple disks
9
Physical layout of SSD
• SSD storage medium is called NAND flash memory
10
Physical layout of SSD
• Blocks are then grouped into planes, and you’ll
find multiple planes on a single NAND-flash chip
12
Abstraction of Persistent Storages
• Two key abstractions – Files and Directories
• What is a file?
• From a user’s perspective, it is a collection of related information that is recorded on persistent
storage with a human-readable name given to it
• From the system’s perspective, it is a linear array of bytes, grouped in (logical) blocks, stored in
somewhere, and has some kind of low-level id given to the file
• In Unix systems, we call this low-level id – inode number
• In Windows systems, it is called file reference number
• This low-level id leads us to a data structure, where the attributes of the files are kept, e.g.,
locations of data of the file, ownership, etc.
• What is a directory?
• Actually, it is a file, but its file content is a mapping table that maps filenames (in that directory)
to their low-level ids
• one entry for each file in that directory; can be a regular file or a directory file
13
File Systems
• Files (include directories) are managed by OS, and the part of OS dealing
with files is known as the file system
• File Management
• Providing services to users and applications in the use of files & directories
• Users should be able to refer to their files by symbolic names rather than having to use
physical device names and physical location
• Storage management
• Allocating space for files on storage devices
• File integrity
• To guarantee, to the extent possible, that the data in the file are valid
• Security
• Data stored in file systems should be subject to strict access controls
14
File Abstraction
• From a user’s standpoint, how to
• locate the file, name the file, access the file, protect the file
15
File Abstraction
• which consists of a set of attributes (metadata) associate to a file:
• Name – human-readable name
• Low-level id – unique tag identifies a file within file system
• Location – pointer to storage locations of the data of the file on device
• Size – current file size
• Accessibility – restrictions placed on access to file data
• controls who can do Read, Write, Execute
• Time, date, and user identification – data for protection, security, and usage monitoring
:
:
16
Directories
• As said, a directory is also a file
• The content of a directory can be seen as a symbol table, which associates file and
directory names within that directory to the corresponding directory entries
• Each entry stores the low-level id and other information of the file or directory
• Operations on directory
• Search for a file
Name Low-level ID
• Create a directory 34
.
• Delete a directory 56
..
• List a directory c0230a 123
• Rename a file c0234a 125
17
Directory Structure
• Hierarchically Structured File System
/
• By placing directories within other directories, we have a directory tree,
where all files and directories are stored
usr home
• A file system starts at a root directory “/”
• The root directory contains various directories in the directory hierarchy
c3230a c3230b
• The full name of a file is usually formed as the pathname from the
root directory to the file – absolute path name bin src .bashrc
• e.g., /home/c3230a/src/ws5.cc
• Pros ws5.cc
• File names need to be unique only within a given directory
• Give more flexibility to users to name and group files
• Efficient searching – by simply traverse the path to locate the files
18
Directory Structure
• To simplify the navigation by using absolute path name, the concept
of “Working directory” (Current directory) is used
• Enables users to specify a pathname that does not begin at the root
directory – relative path name
• Absolute path (i.e., the path beginning at the root) = working directory +
relative path name
19
Directory Structure
• Hard link: create another directory entry that
maps to the same low-level id of the original /
file
• Unix – ln target new
usr home
• Windows – CreateHardLink
• The file is not copied at all; the system just creates
two directory entries at different locations but refer
to the same inode (file control block) src atctam c3230a
Hard link
• Remove one directory entry will not cause the file linux bin src .bashrc mandel.c memscan.c
to be deleted
• The system keeps track on how many different directory
entries have been linked to the same low-level id – linux mandel.c
reference count
• The system deletes the file, only when reference count
reaches zero
20
Directory Structure
/
• Limitation of hard link
• Can’t create a hard link to directory usr home
• Can’t create hard link to files in other disk
partitions (i.e. another file system)
src atctam c3230a
as its data
Symbolic link
• Unix – ln -s target new linux mandel.c
• Windows – mklink, Shortcut
• Symbolic link is a special file type
• Remove the original file causes the soft link
atctam@atctam-LinuxPC:~/src> ls -l linux
lrwxrwxrwx 1 atctam users 14 Nov 30 16:48 linux -> /usr/src/linux
21
Summary
22