OPS-Chapter-6-File Management
OPS-Chapter-6-File Management
E-Notes
necessarily all the same length, each containing a key field in a fixed position in the record.
Three Kinds of files (a) Byte Sequence . (b) Record Sequence. (c ) Tree
Repositioning within a file: The directory is searched for the appropriate entry, and the current file
position is set to a given value. Repositioning within a file does not need to involve any actual I/O. This
file operation is also known as a file seeks.
Deleting a file: To delete a file, we search the directory for the named file. Having found the
associated directory entry, we release all file space and erase the directory entry.
Truncating a file: Instead of deleting a file and then recreate it, this function allows all attributes to
remain unchanged but for the file to be reset to length zero. User wants to erase the contents of the
file.
Other common operations include appending new information to the end of an existing file, and
renaming an existing file.
To read a piece of data that is stored at the end of the file, one has to read all of the data that comes before it-
you cannot jump directly to the desired data. This is similar to the way cassette tape players work. If one
wants to listen to the last song on a cassette tape, he has to either fast-forward over all of the songs that
come before it or listen to them. There is no way to jump directly to a specific song.
Advantages of sequential file:
Easy to access next record
Data organization is simple
Absence of data structure.
Sequential files are typically used in batch applications where they involved the processing of all the
records (payroll, billing etc)
They are easily stored on tape as well as disk.
Automatic backup copy is created
Disadvantages of sequential file:
Wastage of memory space because of master file and transaction file
For interactive applications that involve queries and / or updates of individual records, the sequential
file provides poor performance.
The block number provided by the user to the OS is normally a relative block number. A relative block number
is an index relative to the beginning of the file. Thus, the first relative block of the file is 0, the next is 1, and so
on, even though the actual absolute disk address of the block may be 14703 for the first block and 3192 for
the second. The use of relative block numbers allows the OS to decide where the file should be placed (called
the allocation problem) and helps to prevent the user from accessing portions of the file system that may not
be part of her file. When you work with a direct access file (which is also known as a random access file), you
can jump directly to any piece of data in the file without reading the data that comes before it. This is similar
to the way a CD player or an MP3 player works. You can jump directly to any song that you want to listen to.
Sequential access files are easy to work with, and you can use them to gain an understanding of basic file
operations.
Q. Explain Swapping
The Resident Monitor memory management scheme may seem of little use since it appears to be inherently
single user. These systems used a resident monitor with the remainder of memory available to the currently
executing user. When they switched to the next user, the current contents of user memory were written out
to a backing store (a disk or drum) and the memory of the next user was read in. This scheme is called
Swapping.
Backing Store:
Swapping requires a backing store. The backing store is commonly a fast drum or disk.
It must be large enough to accommodate copies of all memory images for all users, and must provide
direct access to these memory images.
All memory images are on the backing stores and which are ready to run. Whenever CPU scheduler
decides to execute a process it calls the dispatcher.
The dispatcher checks to see whether that process is in memory, if not it swaps out process currently
in memory and swaps in the desired process.
VTP-SAV Operating System (22516) Chapter-6
3
Program: Computer Engineering (NBA Accredited)
E-Notes
Contiguous Allocation
The contiguous allocation method requires each file to occupy a set of contiguous address on the disk.
Disk addresses define a linear ordering on the disk.
With this ordering, accessing block b+1 after block b normally requires no head movement.
Contiguous allocation of a file is defined by the disk address and the length of the first block. If the file
is n blocks long, and starts at location b, then it occupies blocks b, b+1, b+2, …, b+n-1.
The directory entry for each file indicates the address of the starting block and the length of the area
allocated for this file
After getting free block data is written to the file and that block is linked to the end of the file.
To read the file, read blocks by following the pointers from block to block starting with block address
specified in the directory entry.
For example, a file of five blocks starting with block 9 and continue with block 16, then block 1,then
block 10 an finally block 25.each allocated block contains a pointer to the next block.
Indexed Allocation:
In this method, each file has its own index block.
This index block is an array of disk block addresses.
When a file is created, an index block and other disk blocks according to the file size are allocated to
that file.
Pointer to each allocated block is stored in the index block of that file.
Directory entry contains file name and address of index block.
When any block is allocated to the file, its address is updated in the index block.
Any free disk block can be allocated to the file. Each ith entry in the index block points to the ith block
of the file. To find and read the ith block, we use the pointer in the ith index block entry.
Single Level Directory Structure: It is the simplest form of directory structure, having one directory containing
all the files, and each file must have a unique name. Software design is simple. The advantages of this scheme
are its simplicity and the ability to locate files quickly. Since all files are in the same directory, they must have
unique names. If there are two users who call their data file "test", then the unique-name rule is violated.
Even with a single-user, as the number of files increases, it becomes difficult to remember the names of all the
files in order to create files with unique name.
Two level Directory Structure: In this structure, each user has its own user file directory (UFD). The UFD lists
only files of a single user. System contains a master file directory (MFD) which is indexed by user name or
account number. Each entry in MFD points to the UFD for that user. When a user refers to a particular file,
only his own UFD is searched. Different users can have files with the same name, as long as all the file names
within each UFD are unique. When we create a file for a user, operating system searches only that user’s UFD
same name file already present in the directory. For deleting a file again operating system checks the file
name in the user’s UFD only.
Tree-Structured Directory:
The two level hierarchies eliminate name conflicts among users but are not satisfactory for users with a large
number of files. We needed general hierarchy i.e. a tree of directories. With this approach, each user can have
as many directories as are needed so that files can be grouped together in natural ways. Fig, shows A, B, C
directories contained in the root directory each belong to a different user The ability for users to create an
arbitrary number of subdirectories provides a powerful structuring tool for users to organize their work. Users
can access the files of other users.
In this directory, a path name is used to change the current directory to the required file or directory. There
are two types of path names.
Absolute path: An absolute path begins at the root and follows a path down to the specified file, giving
the directory names on the path.
Relative path: A Relative path defines a path from the current directory. If the current directory is
root/spell/mail, then the relative path name prt/first refers to the same file as does the absolute path
name root/spell/mail/prt/first.
If a directory is empty, its entry in the directory that contains it can simply be deleted. However, suppose the
directory to be deleted is not empty but contains several files or subdirectories. One of two approaches can be
taken. Some systems, such as MS-DOS, will not delete a directory unless it is empty. Thus, to delete a
directory, the user must first delete all the files in that directory. If any subdirectories exist, this procedure
must be applied recursively to them, so that they can be deleted also. This approach can result in a substantial
amount of work. An alternative approach, such as that taken by the UNIX rm command, is to provide an
option: when a request is made to delete a directory, all that directory's files and subdirectories are also to be
deleted.
performed, the modified data must be copied back to the disk. The system is responsible for
transferring the data between the disk and main memory as and when required
A magnetic disk consists of a plate/platter, which is made up of metal or glass material, and its surface
is covered with magnetic material to store data on its surface.
If the data can be stored only on one side of the platter, the disk is single-sided, and if both sides are
used to hold the data, the disk is double-sided.
When disk is in used, the spindle motor rotates the platter at constant speed at the speed 60, 90 or
120 revolutions per second.
The surface of a platter is divided into imaginary tracks and sectors. Tracks are concentric circles
where the data is stored, and are numbered from the outermost to the innermost ring, starting with
zero. There are about 50,000 to 100,000 tracks per platter and a disk generally has 1 to 5 platters.
Tracks are sub-divided into sectors. A sector is just like an arc that forms an angle at the center. It is
the smallest unit of information that can be transferred to/from the disk. There are hundreds of
sectors per track and the sector size is typically 512 bytes. The inner tracks are of smaller length than
VTP-SAV Operating System (22516) Chapter-6
10
Program: Computer Engineering (NBA Accredited)
E-Notes
the outer tracks. There are 500 sectors per track in the inner tracks and about 1000 sectors per tracks
towards the boundary.
Disk containing large number of tracks on each surface of platter and more sectors per track have high
speed capacity.
A disk contains one read/write head for each surface of a platter which is used to perform read and
write operation. Information is stored on a sector magnetically by read/write head. The head moves
across the surface of the platter to access different tracks.
All the heads are attached to a single assembly called a disk arm. Thus all heads of different platters
move together.
The assembly of disk platters mounted on a spindle together with the heads mounted on a disk arm is
known as head-disk assembly.
All the read/write heads are on the equal diameter track on different platters at one time. The tracks
of equal diameter on different platters form a cylinder.
Transfer of data between memory and disk drive is handled by a disk controller, which interfaces the
disk drive to the computer system. Some common interfaces used for disk drives are SCSI (small-
computer-system-interface; pronounced “scuzzy”), ATA (Advanced Technology Attachment), SATA
(Serial ATA), PATA (Parallel ATA)
Root Directory:
The Root Directory is like a table of contents for the information stored on the hard disk drive.
The directory area keeps the information about the file name, date and time of the file creation, file
attribute, file size and starting cluster of the particular file.
The number of files that one can store on the root directory depends on the FAT type being used.
RAID 0: This level strips the data into multiple available drives equally giving a very high read and write
performance but offering no fault tolerance or redundancy. This level does not provides any of the RAID factor
and cannot be considered in an organization looking for redundancy instead it is preferred where high
performance is required. Simple striping is used in this level to gain in performance. This level does not offer
any redundancy. Data is broken into stripes of user-defined size and written to a different drive in the array.
Minimum of two disks are required. It uses 100% of the storage capacity since no redundant information is
written. Recommended use for this level is when your data changes infrequently and is backed up regularly
and you require high-speed access. Web servers, graphics design, audio and video editing, and online gaming
are some example applications that might benefit from this level.
Calculation:
No. of Disk: 5
Size of each disk: 100GB
Usable Disk size: 500GB
Pros Cons
Data is stripped into multiple drives No support for Data Redundancy
Disk space is fully utilized No support for Fault Tolerance
Minimum 2 drives required No error detection mechanism
Failure of either disk results in complete data loss in
High performance
respective array
RAID 1:
This level performs mirroring of data in drive 1 to drive 2. It offers 100% redundancy as array will continue to
work even if either disk fails. So organization looking for better redundancy can opt for this solution but again
cost can become a factor. This level uses mirroring and data is duplicated on two drives. If either fails, the
other continues to function until the failed drive is
replaced. At the cost of 50% of available capacity, this level provides very high availability. Rebuild of failed
drives is relatively fast. Read performance is good and write performance is fair compared to single drive read
and write. A minimum of 2 drives is required. Whenever the need for high availability and vital data are
involved, this level is a good candidate for use.
Calculation:
No. of Disk: 2
Size of each disk: 100GB
Usable Disk size: 100GB
Pros Cons
Performs mirroring of data i.e identical data from one Expense is higher (1 extra drive required per
drive is written to another drive for redundancy. drive for mirroring)
High read speed as either disk can be used if one disk Slow write performance as all drives has to be
is busy updated
RAID 2:
This level uses bit-level data stripping rather than block level. To be able to use RAID 2 make sure the disk
selected has no self disk error checking mechanism as this level uses external Hamming code for error
detection. This is one of the reason RAID is not in the existence in real IT world as most of the disks used these
days come with self error detection. It uses an extra disk for storing all the parity information
Calculation:
Formula: n-1 where n is the no. of disk
No. of Disk: 3
Size of each disk: 100GB
Usable Disk size: 200GB
No. of Disk: 7
Size of each disk: 100GB
Usable Disk size: 600GB
Pros Cons
It is used with drives with no built in error detection
BIT level stripping with parity
mechanism
One designated drive is used to store parity These days all SCSI drives have error detection
Uses Hamming code for error detection Additional drives required for error detection
RAID 3:
This level uses byte level stripping along with parity. One dedicated drive is used to store the parity
information and in case of any drive failure the parity is restored using this extra drive. But in case the parity
drive crashes then the redundancy gets affected again so not much considered in organizations.
Calculation:
Formula: n-1 where n is the no. of disk
No. of Disk: 3
Size of each disk: 100GB
Usable Disk size: 200GB
Pros Cons
BYTE level stripping with parity Additional drives required for parity
One designated drive is used to store parity No redundancy in case parity drive crashes
Slow performance for operating on small sized
Data is regenerated using parity drive
files
Data is accessed parallel
High data transfer rates (for large sized files)
Minimum 3 drives required
RAID 4:
This level is very much similar to RAID 3 apart from the feature where RAID 4 uses block level stripping rather
than byte level. interleaves stripes like RAID-0, but it requires an additional drive just to store the parity, which
is used to provide redundancy. In a RAID-4 system, if any one of the disks fails, the data on the remaining disks
can be used to reconstruct the data that was on the failed disk. Even if the parity disk fails, the other disks are
still intact. Thus RAID-4 can survive the failure of any of its disks.
Calculation:
Formula: n-1 where n is the no. of disk
No. of Disk: 3
Size of each disk: 100GB
Usable Disk size: 200GB
Pros Cons
Since only 1 block is accessed at a time so
BLOCK level stripping along with dedicated parity
performance degrades
One designated drive is used to store parity Additional drives required for parity
Write operation becomes slow as every time a
Data is accessed independently
parity has to be entered
Minimum 3 drives required
High read performance since data is accessed
independently.
RAID 5:
It uses block level stripping and with this level distributed parity concept came into the picture leaving behind
the traditional dedicated parity as used in RAID 3 and RAID 5. Parity information is written to a different disk
in the array for each stripe. In case of single disk failure data can be recovered with the help of distributed
parity without affecting the operation and other read write operations. One of the most popular RAID
techniques, it uses Block Striping of data along with parity and writes them to all drives. RAID-5 systems
require a minimum of 3 disks. The impact on capacity is equivalent to removing one drive from the array. If
any one drive fails, the array is said to be degraded, and the data blocks residing on that drive can be derived
from parity and data on remainder of the drives. RAID controllers usually allow a hot spare drive to be
configured that is used when the array is degraded and the array can be rebuilt in the background while
normal operation continues. RAID-5 combine’s good performance, good fault tolerance, with high efficiency. It
is best suited for transaction processing and is often used for “general purpose” service, as well as for
relational database applications, enterprise resource planning and other business systems.
Calculation:
Formula: n-1 where n is the no. of disk
No. of Disk: 4
Size of each disk: 100GB
Usable Disk size: 300GB
Pros Cons
In case of disk failure recovery may take longer
Block level stripping with DISTRIBUTED parity time as parity has to be calculated from all
available drives
Parity is distributed across the disks in an array Cannot survive concurrent drive failures
High Performance
Cost effective
Minimum 3 drives required
VTP-SAV Operating System (22516) Chapter-6
17
Program: Computer Engineering (NBA Accredited)
E-Notes
RAID 6:
This level is an enhanced version of RAID 5 adding extra benefit of dual parity. This level uses block level
stripping with DUAL distributed parity. So now you can get extra redundancy. Imagine you are using RAID 5
and 1 of your disk fails so you need to hurry to replace the failed disk because if simultaneously another disk
fails then you won't be able to recover any of the data so for those situations RAID 6 plays its part where you
can survive 2 concurrent disk failures before you run out of options. The advantages of RAID-6 becomes even
more pronounced as the capacity of SATA drives go up and rebuilds take longer to finish. While calculating a
second parity has a negative impact on performance in software based RAID systems, the effect is very
minimal when hardware RAID engines that have built in circuitry to do the parity calculations are used. RAID-6
requires a minimum of four drives to be implemented and the usable capacity is always 2 less than the
number of available disk drives in the RAID set. Applications suited for this level are the same as those of level
5.
Calculation:
Formula: n-2 where n is the no. of disk
No. of Disk: 4
Size of each disk: 100GB
Usable Disk size: 200GB
Pros Cons
Block level stripping with DUAL distributed parity Cost Expense can become a factor
2 parity blocks are created Writing data takes longer time due to dual parity
RAID 0+1
This level uses RAID 0 and RAID 1 for providing redundancy. Stripping of data is performed before Mirroring. In
this level the overall capacity of usable drives is reduced as compared to other RAID levels. You can sustain
more than one drive failure as long as they are not in the same mirrored set. RAID-01 is technically a
combination of RAID-1 and RAID-0, includes both mirroring and striping, but without parity. RAID-10 is a stripe
across a number of mirrored drives, and is implemented as a striped array whose segments are RAID-1 arrays.
RAID-10 has the same fault tolerance as RAID-1, as well as the same overhead for fault-tolerance as mirroring
alone. Advantages: Very high I/O rates are achieved by striping RAID-1 segments Excellent solution for sites
that would normally use RAID- 1 Great for Oracle and other databases which need high performance and fault
tolerance.
Calculation:
Formula: n/2 * size of disk (where n is the no. of disk)
No. of Disk: 8
Size of each disk: 100GB
Usable Disk size: 400GB
Pros Cons
No parity generation Costly as extra drive is required for each drive
100% disk capacity is not utilized as half is used for
Performs RAID 0 to strip data and RAID 1 to mirror
mirroring
Stripping is performed before Mirroring Very limited scalability
Usable capacity is n/2 * size of disk (n = no. of
disks)
Drives required should be multiple of 2
High Performance as data is stripped
Calculation:
Formula: n/2 * size of disk (where n is the no. of disk)
No. of Disk: 8
Size of each disk: 100GB
Usable Disk size: 400GB
Pros Cons
No Parity generation Very Expensive
Performs RAID 1 to mirror and RAID 0 to strip data Limited scalability