0% found this document useful (0 votes)
23 views55 pages

Chapter-3 Msc-cs-1

Chapter 4 discusses the File Management System, detailing the concept of files, their attributes, operations, and access methods. It explains various directory structures, including single-level, two-level, tree, and acyclic graph structures, along with their advantages and disadvantages. Additionally, it covers file sharing, its importance, risks, and types, emphasizing the need for file protection against potential threats.

Uploaded by

sutarpayal2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views55 pages

Chapter-3 Msc-cs-1

Chapter 4 discusses the File Management System, detailing the concept of files, their attributes, operations, and access methods. It explains various directory structures, including single-level, two-level, tree, and acyclic graph structures, along with their advantages and disadvantages. Additionally, it covers file sharing, its importance, risks, and types, emphasizing the need for file protection against potential threats.

Uploaded by

sutarpayal2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Chapter-4

File Management System:


File Concept:
Computers store information on various storage media such as magnetic disks, magnetic
tapes and optical disks. To allow convenient usage of a computer system, the operating
system provides a uniform logical view of stored information. The operating system abstracts
the physical storage devices to define a logical storage unit, which is the file. Files are
mapped by the operating system onto physical devices which are usually non-volatile,
meaning the contents are persistent between computer reboot.
A File is a named collection of related information that is recorded on secondary storage.
From user’s perspective, a file is the smallest allotment of logical secondary storage, i.e. data
cannot bewritten to secondary storage unless they are within a file. Information stored in a
file is defined by the creator of the file and such information may be of different types, which
may include, source or executable programs, numeric or text data, music, videos, photos etc.
Depending on thetype of a file, files have certain defined structures. For example:
 A text file is sequence of characters organized into line
 A source file is a sequence of functions, each of which is further organized as
declarations followed by executable statements.
 An executable file is a series of code sections that the loader can bring into memory
and execute.The purpose of file is to hold data required for providing information and
therefore, files can be viewed as logical and physical files.
 A Logical file is a file viewed in terms of what data items its records contain and
what processing operations may be performed on the file.
 A Physical file is a file viewed in terms of how the data is stored on storage device
such as magnetic disc and how processing operations are made possible.

File attributes:
For the convenience of human users, each file is given a specific name and therefore, a file is
referred to by its name. A name is usually a string of characters, e.g., example.doc. Once a
file is named, it becomes independent of the process, the user, and even the system that
created it. For example, one user might create a file example.doc and another user might edit
that file by specifying its name. The file owner might write the same file to a USB disk, copy
it, send it across a network and it could be still called example.doc on the destination
system.File attributes vary from one operating system to another but typically a file consists:
 Name: the symbolic file name is the only information kept in human readable form.
 Identifier: this unique tag, usually a number, identifies the file within the file system;
it’sa non-human readable name of file.
 Type: this information is needed for systems that support different types of files.
 Location: this information is a pointer to a device and to the location of the file on that
device.
Size: the current size of the file (in bytes, words, or blocks) and possibly the maximum
allowed size are included in this attribute.
Protection: access-control information determines who can do reading, writing, executing,
and so on.
Time, date and user identification: this information may be kept for creation, last
modification and last use. These data can be useful for protection, security and usage
monitoring.
File Operations:
Any file system provides not only a means to store data organized as files, but a collection of
functions that can be performed on files. Typical operations include:
Create: A new file is defined and positioned within the file system. Two necessary steps
involved for this operation. First, a space in the file system must be found for the file.
Second, an entry for the new file must be made in the directory.
Write: a process updates a file, either by adding new data that expands the size of the file
or by changing the values of existing data items in the file. Usually a system call is made
specifying both the name of file and the information to be written to that file.
Read: a process reads all or portion of the data in a file. To read from a file a system call is
made specifying the name of file and where (in memory) the next block of the file should be
put.

Delete: A file is removed from the file structure and destroyed. Here the directory is searched
for the named file. Once found, all file space is released so that other files can reuse the
space. Lastly, the directory entry of that file is erased

Reposition: this involves moving a file from one directory to another or a different location
in the storage device. Usually, the directory is searched for the appropriate entry and the
current file position pointer is repositioned to a given value.

Truncate: The user may want to erase the contents of a file but keep its attributes. Ratherthan
deleting the file and recreating it afresh, this operation allows all attributes to
remainunchanged – except for file length – but lets the file be reset to length zero and its file
space released.

File Access Methods.

When the information stored in file is used, that information must be accessed and read into
computer memory. There are several ways for accessing this information and choosing the
right method for a particular application poses the major design problem

Sequential access:
This is the simplest access method in which, information in the file is processed in order, one
record after the other. This method of access is by far the most common. For example, editors
and compilers usually access files in this fashion

Indexed sequential access:


This mechanism is built upon the base of sequential access. An index is created for each file.
Index contains pointer to a block. To find a record in the file, the index is first searched and
then the pointer is used to access the file directly and to find the desired record. NB: index is
searched sequentially and its pointer is used to access the file directly.

Direct access:
This method is also referred to as relative access. In this method, a file is made up of fixed-
length logical records that allow programs to read and write records rapidly in no particular
order. A file is viewed as a numbered sequence of blocks or records. As a result, there are no
restrictions on the order of reading or writing for this method of access. Direct access files are
of great use for immediate access to large amounts of information. A good example is a
database in which, when a query concerning a particular subject arrives, the block containing
the answer is computed and then that block is read directly to provide the desired
information.

Directory structure organization :


There are typically thousands, millions, and even billions of files within a computer which
are stored on random access storage devices. Files are usually segregated into groups which
are easier to manage and act upon. This organization involves the use of directories. A
directory contains a set of files or subdirectories. In modern operating systems, directories are
tree-structured which allow users to create their own subdirectories and to organize their files
accordingly. In this tree structure, the tree has a root directory and every file in the system has
a unique path name. A path name is defined by the user name and a file name.
In normal use, each process has a current directory, i.e. the directory containing most of the
files that are of current interest to the process. When reference is made to a file, the current
directory is searched. If a needed file is not in the current directory, then the user must either
specify a path name or change the current directory to be the directory holding that file. Path
names can be of two types: absolute and relative. An absolute path name begins at the root
and follows a path down to the specified file, giving the directory names on the path. A
relative path name defines a path from the current directory.

Every Directory supports a number of common operations on the file:

1. File Creation
2. Search for the file
3. File deletion
4. Renaming the file
5. Traversing Files
6. Listing of files

Single Level Directory

The simplest method is to have one big list of all the files on the disk. The entire system will
contain only one directory which is supposed to mention all the files present in the file
system. The directory contains one entry per each file present on the file system.

This type of directories can be used for a simple system.

Advantages

1. Implementation is very simple.


2. If the sizes of the files are very small then the searching becomes faster.
3. File creation, searching, deletion is very simple since we have only one directory.

Disadvantages

1. We cannot have two files with the same name.


2. The directory may be very big therefore searching for a file may take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number of
files in the system because most of the Operating System limits the number of
characters used to construct the file name
Two Level Directory

In two level directory systems, we can create a separate directory for each user. There is one
master directory which contains separate directories dedicated to each user. For each user,
there is a different directory present at the second level, containing group of user's file. The
system doesn't let a user to enter in the other user's directory without permission.

Characteristics of two level directory system

1. Each files has a path name as /User-name/directory-name/


2. Different users can have the same file name.
3. Searching becomes more efficient as only one user's list needs to be traversed.
4. The same kind of files cannot be grouped into a single directory for a particular user.

Every Operating System maintains a variable as PWD which contains the present directory
name (present user name) so that the searching can be done appropriately.

Tree Structure/ Hierarchical Structure:


Tree directory structure of operating system is most commonly used in our personal
computers. User can create files and subdirectories too, which was a disadvantage in the
previous directory structures.
This directory structure resembles a real tree upside down, where the root directory is at
the peak. This root contains all the directories for each user. The users can create
subdirectories and even store files in their directory.
A user do not have access to the root directory data and cannot modify it. And, even in this
directory the user do not have access to other user’s directories. The structure of tree
directory is given below which shows how there are files and subdirectories in each user’s
directory.
Tree/Hierarchical Directory Structure

Advantages:
 This directory structure allows subdirectories inside a directory.
 The searching is easier.
 File sorting of important and unimportant becomes easier.
 This directory is more scalable than the other two directory structures
explained.
Disadvantages:
 As the user isn’t allowed to access other user’s directory, this prevents the file
sharing among users.
 As the user has the capability to make subdirectories, if the number of
subdirectories increase the searching may become complicated.
 Users cannot modify the root directory data.
 If files do not fit in one, they might have to be fit into other directories.
Acyclic Graph Structure:
where none of them have the capability to access one file from multiple directories. The file
or the subdirectory could be accessed through the directory it was present in, but not from
the other directory.
This problem is solved in acyclic graph directory structure, where a file in one directory can
be accessed from multiple directories. In this way, the files could be shared in between the
users. It is designed in a way that multiple directories point to a particular directory or file
with the help of links.
In the below figure, this explanation can be nicely observed, where a file is shared between
multiple users. If any user makes a change, it would be reflected to both the users.

Acyclic Graph Structure


Advantages:
 Sharing of files and directories is allowed between multiple users.
 Searching becomes too easy.
 Flexibility is increased as file sharing and editing access is there for multiple
users.
Disadvantages:
 Because of the complex structure it has, it is difficult to implement this directory
structure.
 The user must be very cautious to edit or even deletion of file as the file is
accessed by multiple users.
 If we need to delete the file, then we need to delete all the references of the file
inorder to delete it permanently.

File Sharing:
Definition of file sharing

File sharing refers to the process of sharing or distributing electronic files such as documents,
music, videos, images, and software between two or more users or computers.

Importance of file sharing

File sharing plays a vital role in facilitating collaboration and communication among
individuals and organizations. It allows people to share files quickly and easily across
different locations, reducing the need for physical meetings and enabling remote work. File
sharing also helps individuals and organizations save time and money, as it eliminates the
need for physical transportation of files.

Risks and challenges of file sharing

File sharing can pose several risks and challenges, including the spread of malware and
viruses, data breaches and leaks, legal consequences, and identity theft. Unauthorized access
to sensitive files can also result in loss of intellectual property, financial losses, and
reputational damage.

The need for file protection

With the increase in cyber threats and the sensitive nature of the files being shared, it is
essential to implement adequate file protection measures to secure the files from unauthorized
access, theft, and cyberattacks. Effective file protection measures can help prevent data
breaches and other cyber incidents, safeguard intellectual property, and maintain business
continuity.
Types of File Sharing

File sharing refers to the practice of distributing or providing access to digital files, such as
documents, images, audio, and video files, between two or more users or devices. There are
several types of file sharing methods available, and each method has its own unique
advantages and disadvantages.

 Peer-to-Peer (P2P) File Sharing − Peer-to-peer file sharing allows users to share
files with each other without the need for a centralized server. Instead, users connect
to each other directly and exchange files through a network of peers. P2P file sharing
is commonly used for sharing large files such as movies, music, and software.
 Cloud-Based File Sharing − Cloud-based file sharing involves the storage of files in
a remote server, which can be accessed from any device with an internet connection.
Users can upload and download files from cloud-based file sharing services such as
Google Drive, Dropbox, and OneDrive. Cloud-based file sharing allows users to
easily share files with others, collaborate on documents, and access files from
anywhere.
 Direct File Transfer − Direct file transfer involves the transfer of files between two
devices through a direct connection such as Bluetooth or Wi-Fi Direct. Direct file
transfer is commonly used for sharing files between mobile devices or laptops.
 Removable Media File Sharing − Removable media file sharing involves the use of
physical storage devices such as USB drives or external hard drives. Users can copy
files onto the device and share them with others by physically passing the device to
them.

Each type of file sharing method comes with its own set of risks and challenges. Peer-to-peer
file sharing can expose users to malware and viruses, while cloud-based file sharing can lead
to data breaches if security measures are not implemented properly. Direct file transfer and
removable media file sharing can also lead to data breaches if devices are lost or stolen.

To protect against these risks, users should take precautions such as using encryption,
password protection, secure file transfer protocols, and regularly updating antivirus and
antimalware software. It is also essential to educate users on safe file sharing practices and
limit access to files only to authorized individuals or groups. By taking these steps, users can
ensure that their files remain secure and protected during file sharing.

Risks of File Sharing

File sharing is a convenient and efficient way to share information and collaborate on
projects. However, it comes with several risks and challenges that can compromise the
confidentiality, integrity, and availability of files. In this section, we will explore some of the
most significant risks of file sharing.

 Malware and Viruses − One of the most significant risks of file sharing is the spread
of malware and viruses. Files obtained from untrusted sources, such as peer-to-peer
(P2P) networks, can contain malware that can infect the user's device and compromise
the security of their files. Malware and viruses can cause damage to the user's device,
steal personal information, or even use their device for illegal activities without their
knowledge.
 Data Breaches and Leaks − Another significant risk of file sharing is the possibility
of data breaches and leaks. Cloud-based file sharing services and P2P networks are
particularly vulnerable to data breaches if security measures are not implemented
properly. Data breaches can result in the loss of sensitive information, such as
personal data or intellectual property, which can have severe consequences for both
individuals and organizations.
 Legal Consequences − File sharing copyrighted material without permission can lead
to legal consequences. Sharing copyrighted music, movies, or software can result in
copyright infringement lawsuits and hefty fines.
 Identity Theft − File sharing can also expose users to identity theft. Personal
information, such as login credentials or social security numbers, can be inadvertently
shared through file sharing if security measures are not implemented properly.
Cybercriminals can use this information to commit identity theft, which can have
severe consequences for the victim.

To protect against these risks, users should take precautions such as using trusted sources for
file sharing, limiting access to files, educating users on safe file sharing practices, and
regularly updating antivirus and anti-malware software. By taking these steps, users can
reduce the risk of malware and viruses, data breaches and leaks, legal consequences, and
identity theft during file sharing.

File Sharing Protection Measures

 Encryption − Encryption is the process of converting data into a coded language that
can only be accessed by authorized users with a decryption key. This can help protect
files from unauthorized access and ensure that data remains confidential even if it is
intercepted during file sharing.
 Password protection − Password protection involves securing files with a password
that must be entered before the file can be accessed. This can help prevent
unauthorized access to files and ensure that only authorized users can view or modify
the files.
 Secure file transfer protocols − Secure file transfer protocols, such as SFTP (Secure
File Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure), provide a
secure way to transfer files over the internet. These protocols use encryption and other
security measures to protect files from interception and unauthorized access during
transfer.
 Firewall protection − Firewall protection involves using a firewall to monitor and
control network traffic to prevent unauthorized access to the user's device or network.
Firewalls can also be configured to block specific file sharing protocols or limit
access to certain users or devices, providing an additional layer of protection for
shared files.

File Protection:
File protection in an operating system refers to the various mechanisms and techniques used
to secure files from unauthorized access, alteration, or deletion. It involves controlling access
to files, ensuring their security and confidentiality, and preventing data breaches and other
security incidents.

Type of File protection

 File Permissions − File permissions are a basic form of file protection that controls
access to files by setting permissions for users and groups. File permissions allow the
system administrator to assign specific access rights to users and groups, which can
include read, write, and execute privileges. These access rights can be assigned at the
file or directory level, allowing users and groups to access specific files or directories
as needed. File permissions can be modified by the system administrator at any time
to adjust access privileges, which helps to prevent unauthorized access.
 Encryption − Encryption is the process of converting plain text into ciphertext to
protect files from unauthorized access. Encrypted files can only be accessed by
authorized users who have the correct encryption key to decrypt them. Encryption is
widely used to secure sensitive data such as financial information, personal data, and
other confidential information. In an operating system, encryption can be applied to
individual files or entire directories, providing an extra layer of protection against
unauthorized access.
 Access Control Lists (ACLs) − Access control lists (ACLs) are lists of permissions
attached to files and directories that define which users or groups have access to them
and what actions they can perform on them. ACLs can be more granular than file
permissions, allowing the system administrator to specify exactly which users or
groups can access specific files or directories. ACLs can also be used to grant or deny
specific permissions, such as read, write, or execute privileges, to individual users or
groups.
 Auditing and Logging − Auditing and logging are mechanisms used to track and
monitor file access, changes, and deletions. It involves creating a record of all file
access and changes, including who accessed the file, what actions were performed,
and when they were performed. Auditing and logging can help to detect and prevent
unauthorized access and can also provide an audit trail for compliance purposes.
 Physical File Security − Physical file security involves protecting files from physical
damage or theft. It includes measures such as file storage and access control, backup
and recovery, and physical security best practices. Physical file security is essential
for ensuring the integrity and availability of critical data, as well as compliance with
regulatory requirements

Advantages of File protection

 Data Security − File protection mechanisms such as encryption, access control lists,
and file permissions provide robust data security by preventing unauthorized access to
files. These mechanisms ensure that only authorized users can access files, which
helps to prevent data breaches and other security incidents. Data security is critical for
organizations that handle sensitive data such as personal data, financial information,
and intellectual property.
 Compliance − File protection mechanisms are essential for compliance with
regulatory requirements such as GDPR, HIPAA, and PCI-DSS. These regulations
require organizations to implement appropriate security measures to protect sensitive
data from unauthorized access, alteration, or deletion. Failure to comply with these
regulations can result in significant financial penalties and reputational damage.
 Business Continuity − File protection mechanisms are essential for ensuring business
continuity by preventing data loss due to accidental or malicious deletion, corruption,
or other types of damage. File protection mechanisms such as backup and recovery,
auditing, and logging can help to recover data quickly in the event of a data loss
incident, ensuring that business operations can resume as quickly as possible.
 Increased Productivity − File protection mechanisms can help to increase
productivity by ensuring that files are available to authorized users when they need
them. By preventing unauthorized access, alteration, or deletion of files, file
protection mechanisms help to minimize the risk of downtime and data loss incidents
that can impact productivity.
 Enhanced Collaboration − File protection mechanisms can help to enhance
collaboration by allowing authorized users to access and share files securely. Access
control lists, file permissions, and encryption can help to ensure that files are only
accessed by authorized users, which helps to prevent conflicts and misunderstandings
that can arise when multiple users access the same file.
 Reputation − File protection mechanisms can enhance an organizations reputation by
demonstrating a commitment to data security and compliance. By implementing
robust file protection mechanisms, organizations can build trust with their customers,
partners, and stakeholders, which can have a positive impact on their reputation and
bottom line.

Disadvantages of File protection

 Overhead − Some file protection mechanisms such as encryption, access control lists,
and auditing can add overhead to system performance. This can impact system
resources and slow down file access and processing times.
 Complexity − File protection mechanisms can be complex and require specialized
knowledge to implement and manage. This can lead to errors and misconfigurations
that compromise data security.
 Compatibility Issues − Some file protection mechanisms may not be compatible with
all types of files or applications, leading to compatibility issues and limitations in file
usage.
 Cost − Implementing robust file protection mechanisms can be expensive, especially
for small organizations with limited budgets. This can make it difficult to achieve full
data protection.
 User Frustration − Stringent file protection mechanisms such as complex passwords,
frequent authentication requirements, and restricted access can frustrate users and
impact productivity.

File System Structure:

File System provide efficient access to the disk by allowing data to be stored, located and
retrieved in a convenient way. A file System must be able to store the file, locate the file and
retrieve the file.
Most of the Operating Systems use layering approach for every task including file systems.
Every layer of the file system is responsible for some activities.

1. I/O Control level – Device drivers act as an interface between devices and OS,
they help to transfer data between disk and main memory. It takes block number
as input and as output, it gives low-level hardware-specific instruction.
2. Basic file system – It Issues general commands to the device driver to read and
write physical blocks on disk. It manages the memory buffers and caches. A
block in the buffer can hold the contents of the disk block and the cache stores
frequently used file system metadata.
3. File organization Module – It has information about files, the location of files
and their logical and physical blocks. Physical blocks do not match with logical
numbers of logical blocks numbered from 0 to N. It also has a free space that
tracks unallocated blocks.
4. Logical file system – It manages metadata information about a file i.e includes
all details about a file except the actual contents of the file. It also maintains via
file control blocks. File control block (FCB) has information about a file –
owner, size, permissions, and location of file contents.

File System Implementation:


File system implementation in an operating system refers to how the file system manages
the storage and retrieval of data on a physical storage device such as a hard drive, solid -
state drive, or flash drive. The file system implementation includes several components,
including:
1. File System Structure: The file system structure refers to how the files and
directories are organized and stored on the physical storage device. This
includes the layout of file systems data structures such as the directory structure,
file allocation table, and inodes.
2. File Allocation: The file allocation mechanism determines how files are
allocated on the storage device. This can include allocation techniques such as
contiguous allocation, linked allocation, indexed allocation, or a combination of
these techniques.
3. Data Retrieval: The file system implementation determines how the data is read
from and written to the physical storage device. This includes strategies such as
buffering and caching to optimize file I/O performance.
4. Security and Permissions: The file system implementation includes features
for managing file security and permissions. This includes access control lists
(ACLs), file permissions, and ownership management.
5. Recovery and Fault Tolerance: The file system implementation includes
features for recovering from system failures and maintaining data integrity. This
includes techniques such as journaling and file system snapshots.
File system implementation is a critical aspect of an operating system as it directly impacts
the performance, reliability, and security of the system. Different operating systems use
different file system implementations based on the specific needs of the system and the
intended use cases. Some common file systems used in operating systems include NTFS
and FAT in Windows, and ext4 and XFS in Linux.
Advantages
1. Duplication of code is minimized.
2. Each file system can have its own logical file system.
3. File system implementation in an operating system provides several advantages,
including:
4. Efficient Data Storage: File system implementation ensures efficient data
storage on a physical storage device. It provides a structured way of organizing
files and directories, which makes it easy to find and access files.
5. Data Security: File system implementation includes features for managing file
security and permissions. This ensures that sensitive data is protected from
unauthorized access.
6. Data Recovery: The file system implementation includes features for
recovering from system failures and maintaining data integrity. This helps to
prevent data loss and ensures that data can be recovered in the event of a system
failure.
7. Improved Performance: File system implementation includes techniques such
as buffering and caching to optimize file I/O performance. This results in faster
access to data and improved overall system performance.
8. Scalability: File system implementation can be designed to be scalable, making
it possible to store and retrieve large amounts of data efficiently.
9. Flexibility: Different file system implementations can be designed to meet
specific needs and use cases. This allows developers to choose the best file
system implementation for their specific requirements.
10. Cross-Platform Compatibility: Many file system implementations are cross-
platform compatible, which means they can be used on different operating
systems. This makes it easy to transfer files between different systems.

Directory Implemetation:

Directory implementation in the operating system can be done using Singly Linked List and
Hash table. The efficiency, reliability, and performance of a file system are greatly affected
by the selection of directory-allocation and directory-management algorithms. There are
numerous ways in which the directories can be implemented. But we need to choose an
appropriate directory implementation algorithm that enhances the performance of the
system.

Directory Implementation using Singly Linked List

The implementation of directories using a singly linked list is easy to program but is time-
consuming to execute. Here we implement a directory by using a linear list of filenames
with pointers to the data blocks.

Directory Implementation Using Singly Linked List

 To create a new file the entire list has to be checked such that the new directory
does not exist previously.
 The new directory then can be added to the end of the list or at the beginning of
the list.
 In order to delete a file, we first search the directory with the name of the file to
be deleted. After searching we can delete that file by releasing the space
allocated to it.
 To reuse the directory entry we can mark that entry as unused or we can append
it to the list of free directories.
 To delete a file linked list is the best choice as it takes less time.
Disadvantage
The main disadvantage of using a linked list is that when the user needs to find a file the
user has to do a linear search. In today’s world directory information is used quite
frequently and linked list implementation results in slow access to a file. So the operating
system maintains a cache to store the most recently used directory information.

Directory Implementation using Hash Table

An alternative data structure that can be used for directory implementation is a hash table.
It overcomes the major drawbacks of directory implementation using a linked list. In this
method, we use a hash table along with the linked list. Here the linked list stores the
directory entries, but a hash data structure is used in combination with the linked list.
In the hash table for each pair in the directory key-value pair is generated. The hash
function on the file name determines the key and this key points to the corresponding file
stored in the directory. This method efficiently decreases the directory search time as the
entire list will not be searched on every operation. Using the keys the hash table entries are
checked and when the file is found it is fetched.

Directory Implementation Using Hash Table

Disadvantage:
The major drawback of using the hash table is that generally, it has a fixed size and its
dependency on size. But this method is usually faster than linear search through an entire
directory using a linked list.

Allocation Methods:
The allocation methods define how the files are stored in the disk blocks. There are three
main disk space or file allocation methods.
 Contiguous Allocation
 Linked Allocation
 Indexed Allocation
The main idea behind these methods is to provide:
 Efficient disk space utilization.
 Fast access to the file blocks.
1. Contiguous Allocation:
In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a
file requires n blocks and is given a block b as the starting location, then the blocks
assigned to the file will be: b, b+1, b+2,……b+n-1. This means that given the starting
block address and the length of the file (in terms of blocks required), we can determine the
blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
 Address of starting block
 Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks.
Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
 Both the Sequential and Direct Accesses are supported by this. For direct access,
the address of the kth block of the file which starts at block b can easily be
obtained as (b+k).
 This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
 This method suffers from both internal and external fragmentation. This makes it
inefficient in terms of memory utilization.
 Increasing file size is difficult because it depends on the availability of
contiguous memory at a particular instance.
2. Linked List Allocation
In this scheme, each file is a linked list of disk blocks which need not be contiguous. The
disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block. Each block
contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The last
block (25) contains -1 indicating a null pointer and does not point to any other block.

Advantages:
 This is very flexible in terms of file size. File size can be increased easily since
the system does not have to look for a contiguous chunk of memory.
 This method does not suffer from external fragmentation. This makes it
relatively better in terms of memory utilization.
Disadvantages:
 Because the file blocks are distributed randomly on the disk, a large number of
seeks are needed to access every block individually. This makes linked
allocation slower.
 It does not support random or direct access. We can not directly access the
blocks of a file. A block k of a file can be accessed by traversing k blocks
sequentially (sequential access ) from the starting block of the file via block
pointers.
 Pointers required in the linked allocation incur some extra overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all the
blocks occupied by a file. Each file has its own index block. The ith entry in the index
block contains the disk address of the ith file block. The directory entry contains the
address of the index block as shown in the image:

Advantages:
 This supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
 It overcomes the problem of external fragmentation.
Disadvantages:
 The pointer overhead for indexed allocation is greater than linked allocation.
 For very small files, say files that expand only 2-3 blocks, the indexed allocation
would keep one entire block (index block) for the pointers which is inefficient in
terms of memory utilization. However, in linked allocation we lose the space of
only 1 pointer per block.

Free Space Management System:


Free space management is a critical aspect of operating systems as it involves managing the
available storage space on the hard disk or other secondary storage devices. The operating
system uses various techniques to manage free space and optimize the use of storage
devices. Here are some of the commonly used free space management techniques:
1. Linked Allocation: In this technique, each file is represented by a linked list of
disk blocks. When a file is created, the operating system finds enough free space
on the disk and links the blocks of the file to form a chain. This method is
simple to implement but can lead to fragmentation and wastage of space.
2. Contiguous Allocation: In this technique, each file is stored as a contiguous
block of disk space. When a file is created, the operating system finds a
contiguous block of free space and assigns it to the file. This method is efficient
as it minimizes fragmentation but suffers from the problem of external
fragmentation.
3. Indexed Allocation: In this technique, a separate index block is used to store the
addresses of all the disk blocks that make up a file. When a file is created, the
operating system creates an index block and stores the addresses of all the
blocks in the file. This method is efficient in terms of storage space and
minimizes fragmentation.
4. File Allocation Table (FAT): In this technique, the operating system uses a file
allocation table to keep track of the location of each file on the disk. When a file
is created, the operating system updates the file allocation table with the address
of the disk blocks that make up the file. This method is widely used in Microsoft
Windows operating systems.
5. Volume Shadow Copy: This is a technology used in Microsoft Windows
operating systems to create backup copies of files or entire volumes. When a file
is modified, the operating system creates a shadow copy of the file and stores it
in a separate location. This method is useful for data recovery and protection
against accidental file deletion.
The free space list can be implemented mainly as:
1. Bitmap or Bit vector – A Bitmap or Bit Vector is series or collection of bits
where each bit corresponds to a disk block. The bit can take two values: 0 and
1: 0 indicates that the block is allocated and 1 indicates a free block. The given
instance of disk blocks on the disk in Figure 1 (where green blocks are
allocated) can be represented by a bitmap of 16 bits as: 0000111000000110.

Advantages –
 Simple to understand.
 Finding the first free block is efficient. It requires scanning the words
(a group of 8 bits) in a bitmap for a non-zero word. (A 0-valued word
has all bits 0). The first free block is then found by scanning for the
first 1 bit in the non-zero word.
2. Linked List – In this approach, the free disk blocks are linked together i.e. a
free block contains a pointer to the next free block. The block number of the
very first disk block is stored at a separate location on disk and is also cached in

memory. In Figure-2, the free


space list head points to Block 5 which points to Block 6, the next free block and
so on. The last free block would contain a null pointer indicating the end of free
list. A drawback of this method is the I/O required for free space list traversal.
3. Grouping – This approach stores the address of the free blocks in the first free
block. The first free block stores the address of some, say n free blocks. Out of
these n blocks, the first n-1 blocks are actually free and the last block contains
the address of next free n blocks. An advantage of this approach is that the
addresses of a group of free disk blocks can be found easily.
4. Counting – This approach stores the address of the first free disk block and a
number n of free contiguous disk blocks that follow the first block. Every entry
in the list would contain:
1. Address of first free disk block
2. A number n

Advantages:

1. Efficient use of storage space: Free space management techniques help to


optimize the use of storage space on the hard disk or other secondary storage
devices.
2. Easy to implement: Some techniques, such as linked allocation, are simple to
implement and require less overhead in terms of processing and memory
resources.
3. Faster access to files: Techniques such as contiguous allocation can help to
reduce disk fragmentation and improve access time to files.

Disadvantages:

1. Fragmentation: Techniques such as linked allocation can lead to fragmentation


of disk space, which can decrease the efficiency of storage devices.
2. Overhead: Some techniques, such as indexed allocation, require additional
overhead in terms of memory and processing resources to maintain index blocks.
3. Limited scalability: Some techniques, such as FAT, have limited scalability in
terms of the number of files that can be stored on the disk.
4. Risk of data loss: In some cases, such as with contiguous allocation, if a file
becomes corrupted or damaged, it may be difficult to recover the data.
5. Overall, the choice of free space management technique depends on the specific
requirements of the operating system and the storage devices being used. While
some techniques may offer advantages in terms of efficiency and speed, they
may also have limitations and drawbacks that need to be considered.

Efficiency & Performance:


Efficiency:

 UNIX pre-allocates inodes, which occupies space even before any files are created.
 UNIX also distributes inodes across the disk, and tries to store data files near their
inode, to reduce the distance of disk seeks between the inodes and the data.
 Some systems use variable size clusters depending on the file size.
 The more data that is stored in a directory ( e.g. last access time ), the more often the
directory blocks have to be re-written.
 As technology advances, addressing schemes have had to grow as well.
o Sun's ZFS file system uses 128-bit pointers, which should theoretically never
need to be expanded. ( The mass required to store 2^128 bytes with atomic
storage would be at least 272 trillion kilograms! )
 Kernel table sizes used to be fixed, and could only be changed by rebuilding the
kernels. Modern tables are dynamically allocated, but that requires more
complicated algorithms for accessing them.

Performance:

 Disk controllers generally include on-board caching. When a seek is requested, the
heads are moved into place, and then an entire track is read, starting from whatever
sector is currently under the heads ( reducing latency. ) The requested sector is
returned and the unrequested portion of the track is cached in the disk's electronics.
 Some OSes cache disk blocks they expect to need again in a buffer cache.
 A page cache connected to the virtual memory system is actually more efficient as
memory addresses do not need to be converted to disk block addresses and back
again.
 Some systems ( Solaris, Linux, Windows 2000, NT, XP ) use page caching for both
process pages and file data in a unified virtual memory.

Figure 12.11 - I/O without a unified buffer cache.

Figure 12.12 - I/O using a unified buffer cache.

 Page replacement strategies can be complicated with a unified cache, as one needs
to decide whether to replace process or file pages, and how many pages to
guarantee to each category of pages. Solaris, for example, has gone through many
variations, resulting in priority paging giving process pages priority over file I/O
pages, and setting limits so that neither can knock the other completely out of
memory.
 Another issue affecting performance is the question of whether to
implement synchronous writes or asynchronous writes. Synchronous writes occur in
the order in which the disk subsystem receives them, without caching; Asynchronous
writes are cached, allowing the disk subsystem to schedule writes in a more efficient
order ( See Chapter 12. ) Metadata writes are often done synchronously. Some
systems support flags to the open call requiring that writes be synchronous, for
example for the benefit of database systems that require their writes be performed
in a required order.
 The type of file access can also have an impact on optimal page replacement policies.
For example, LRU is not necessarily a good policy for sequential access files. For these
types of files progression normally goes in a forward direction only, and the most
recently used page will not be needed again until after the file has been rewound and
re-read from the beginning, ( if it is ever needed at all. ) On the other hand, we can
expect to need the next page in the file fairly soon. For this reason sequential access
files often take advantage of two special policies:
o Free-behind frees up a page as soon as the next page in the file is requested,
with the assumption that we are now done with the old page and won't need
it again for a long time.
o Read-ahead reads the requested page and several subsequent pages at the
same time, with the assumption that those pages will be needed in the near
future. This is similar to the track caching that is already performed by the
disk controller, except it saves the future latency of transferring data from the
disk controller memory into motherboard main memory.
 The caching system and asynchronous writes speed up disk writes considerably,
because the disk subsystem can schedule physical writes to the disk to minimize
head movement and disk seek times.

Recovery:
System failure (e.g. sudden power outage)
may result in
1. Loss of data
2. Inconsistency of data
File system recovery techniques :
1.Consistency checker :
Compares data in directory structure with data blocks on disk, and tries to fix inconsistencies
Examples: fsck in Unix, chkdsk in Windows
2.Back up:
Use system programs to regularily back up data from disk to another storage device (e.g.
magnetic tape or other disk)
Recover lost file or disk by restoring data from backup.

Logged Structured File System:

Log-based transaction-oriented file systems:


 Record each update to the file system as a transaction
 aka. journaling file system

All transactions are written to a log file

 A transaction is considered committed once it is written to the log


 However, the file system may not yet be updated

Transactions in the log are asynchronously written to the file system

 When the file system is successfully modified, the transaction is removed from the log
 If the file system crashes, all remaining transactions in the log must still be performed

NFS(Network File System):

NFS is an abbreviation of the Network File System. It is a protocol of a distributed


file system. This protocol was developed by the Sun Microsystems in the year of
1984.

It is an architecture of the client/server, which contains a client program, server


program, and a protocol that helps for communication between the client and server.

It is that protocol which allows the users to access the data and files remotely over
the network. Any user can easily implement the NFS protocol because it is an open
standard. Any user can manipulate files as same as if they were on like other
protocols. This protocol is also built on the ONC RPC system.

This protocol is mainly implemented on those computing environments where the


centralized management of resources and data is critical. It uses the Transmission
Control Protocol (TCP) and User Datagram Protocol (UDP) for accessing and
delivering the data and files.

Network File System is a protocol that works on all the networks of IP-based. It is
implemented in that client/server application in which the server of NFS manages the
authorization, authentication, and clients. This protocol is used with Apple Mac OS,
Unix, and Unix-like operating systems such as Solaris, Linux, FreeBSD, AIX.

Benefits of NFS

The benefits of NFS are as follows −

 NFS supports central management.


 NFS allows for a customer to log into any server and have access to their files
transparently.
 There is no manual refresh needed for new files.
 It can secure it with firewalls and Kerberos.

I/O Management:

I/O Hardware:

In order to manage and control the various I/O device attached to computer, I/O
system requires some hardware and software components. I/O devices commonly
use certain hardware devices. These are: system bus and ports.

o Ports are the plugs used to connect I/O devices to the computer.
o Bus is a set of wires to which these ports and I/O controllers are connected
and through which signals are send for 1/O command.

1. Polling

Polling is a software technique that uses a program to check the status of devices. The
device can be a disk drive or any other peripheral device in the computer. The program
polls the device for information, such as if it has data available or not. Polling is a slow way
to get data from a device because it has to wait until another function occurs before being
able to get information about its state.
In some cases polling may be desirable; for example, when there are several items being
polled simultaneously and only one item updates its state at any time — the rest continue
waiting until they receive an acknowledgment from one item that it’s done updating their
states (which could take seconds).
Polling can also be used to check if a device is online or not. If the device is offline, then
this information can be used to take some appropriate action such as pausing or suspending
tasks that depend on the device (e.g., stopping a backup in progress).

2. Interrupts

The CPU is interrupted by several different devices, such as I/O hardware and peripheral
devices. These interrupts are used by the I/O device to notify the CPU when it needs
attention. The CPU can be interrupted by several different devices at any given time, but
only one interrupt will be delivered to it at any given time. This can happen because of a
hardware or software error occurring on an I/O bus—for example, if a disk drive has failed
then all other data transfers will pause until it’s repaired; or perhaps another device wants
access to memory (or vice versa). In either case, there won’t be any way for your
application program not to be written specifically for this particular machine!
3. Direct Memory Access:

DMA is a way to move data between the CPU and I/O devices. When a user wants to
access memory from an I/O device (such as a disk drive), the user must first inform the
operating system that they are going to perform the operation. This is done by attaching a
special command called an interrupt request (IRQ) handler routine for each device that
needs access. The interrupt handler routine then tells the user program when it should be
ready for another task—usually using interrupts generated by hardware components like
joysticks or network cards—and may even tell other processes running on other CPUs not
only about what’s happening at that moment but also when something has happened before
so they can take appropriate action as well!

Application of I/O Interface:

I/O Interface:
There is need of interface whenever any CPU wants to communicate with I/O devices. The
interface is used to interpret address which is generated by CPU. Thus, surface is used to
communicate to I/O devices i.e. to share information between CPU and I/O devices
interface is used which is called as I/O Interface.
Various applications of I/O Interface:
Application of I/O is that we can say interface have access to open any file without an y
kind of information about file i.e., even basic information of file is unknown. It also has
feature that it can be used to also add new devices to computer system even it does not
cause any kind of interrupt to operating system. It can also used to abstract differences in
I/O devices by identifying general kinds. The access to each of general kind is through
standardized set of function which is called as interface.
1. Character-stream or Block:
A character stream or block both transfers data in form of bytes. The difference
between both of them is that character-stream transfers bytes in linear way i.e.,
one after another whereas block transfers whole byte in single unit.
2. Sequential or Random Access:
To transfer data in fixed order determined by device, we use sequential device
whereas user to instruct device to seek to any of data storage locations, random-
access device is used.
3. Synchronous or Asynchronous:
Data transfers with predictable response times is performed by synchronous
device, in coordination with others aspects of system. An irregular or
unpredictable response times not coordinated with other computer events is
exhibits by an asynchronous device.
4. Sharable or Dedicated:
Several processes or threads can be used concurrently by sharable device;
whereas dedicated device cannot.
5. Speed of Operation:
The speed of device has range set which is of few bytes per second to few giga-
bytes per second.
6. Read-write, read only, write-only:
Different devices perform different operations, some supports both input and
output, but others supports only one data transfer direction either input or output.

Kernel I/O Subsystem:

The Kernel I/O Subsystem is a fundamental component of modern


operating systems. It is responsible for managing all input/output (I/O)
operations on a computer. The I/O subsystem provides various services
that enable efficient and secure management of the I/O operations.
I/O Request Scheduling in the Kernel

One of the key services provided by the I/O subsystem is the scheduling
of I/O requests. Scheduling involves determining the best order in which
to execute I/O requests to improve system performance, share device
access permissions fairly, and reduce the average waiting time, response
time, and turnaround time for I/O operations to complete. The OS
developers implement schedules by maintaining a wait queue of requests
for each device, and the I/O scheduler rearranges the order to improve
the efficiency of the system.

Buffering

Another important service provided by the I/O subsystem is buffering.


Buffers are used to cope with speed mismatches, provide adaptation for
different data transfer sizes, and support copy semantics for the
application I/O. A buffer is a memory area that stores data being
transferred between two devices or between a device and an application.

Caching

Caching is another service provided by the I/O subsystem. It is a region


of fast memory that holds a copy of data, making access to the cached
copy much easier than the original file. The main difference between a
buffer and a cache is that a buffer may hold only the existing copy of a
data item, while a cache holds a copy of faster storage of an item that
resides elsewhere.

Spooling and Device Reservation

Spooling and device reservation are also important services provided by


the I/O subsystem. They are used to hold the output of a device, such as
a printer that cannot accept interleaved data streams, in a buffer known
as a spool. All applications' output is spooled in a separate disk file,
preventing all output from continuing to the printer. When an application
finishes printing, the spooling system queues the corresponding spool file
for output to the printer.

Error Handling
Error handling is another crucial function of the I/O subsystem, which
guards against many kinds of hardware and application errors. An OS that
uses protected memory can prevent a complete system failure from minor
mechanical glitches. Devices and I/O transfers can fail transiently or
permanently, but the OS can handle such failures in different ways.

I/O Protection

Finally, I/O protection ensures that user processes cannot issue illegal I/O
instructions to disrupt the normal function of a system. The I/O
subsystem implements various mechanisms to prevent such disruptions
by defining all I/O instructions as privileged instructions. Users cannot
issue I/O instructions directly, preventing illegal I/O access

Transforming I/O Requests to Hardware Operations:

Modern operating systems gain significant flexibility from multiple stages of lookup tables
in path between request and physical device stages controller. There is general mechanisms
is which is used to pass request between application and drivers. Thus, without recompiling
kernel, we can introduce new devices and drivers into computer. In fact, some operating
system have the ability to load device drivers on demand. At the time of booting, system
firstly probes hardware buses to determine what devices are present. It is then loaded to
necessary drivers, accordingly I/O request.

1. System call –
Whenever, any I/O request comes, process issues blocking read() system call to
previously opened file descriptor of file. Basically, role of system-call code is to
check parameters for correctness in kernel. If data we put in form of input is
already available in buffer cache, data is going to returned to process, and in that
case I/O request is completed.
2. Alternative approach if input is not available –
If the data is not available in buffer cache then physical I/O must be performed.
The process is removes from run queue and is placed on wait queue for device,
and I/O request is scheduled. After scheduling, I/O subsystem sends request to
device driver via subroutine call or in-kernel message but it depends upon
operating system by which mode request will send.
3. Role of Device driver –
After receiving the request, device driver have to receive data and it will receive
data by allocating kernel buffer space and after receiving data it will schedules
I/O. After all this, command will be given to device controller by writing into
device-control registers.
4. Role of Device Controller –
Now, device controller operates device hardware. Actually, data transfer is done
by device hardware.
5. Role of DMA controller –
After data transfer, driver may poll for status and data, or it may have set up
DMA transfer into kernel memory. The transfer is managed by DMA controller.
At last when transfers complete, it will generates interrupt.
6. Role of interrupt handler –
The interrupt is send to correct interrupt handler through interrupt-vector table.
It store any necessary data, signals device driver, and returns from interrupt.
7. Completion of I/O request –
When, device driver receives signal. This signal determines that I/O request has
completed and also determines request’s status, signals kernel I/O subsystem
that request has been completed. After transferring data or return codes to
address space kernel moves process from wait queue back to ready queue.
8. Completion of System call –
When process moves to ready queue it means process is unblocked. When the
process is assigned to CPU, it means process resumes execution at completion of
system call

STREAMS:

 The streams mechanism in UNIX provides a bi-directional pipeline


between a user process and a device driver, onto which additional
modules can be added.
 The user process interacts with the stream head.
 The device driver interacts with the device end.
 Zero or more stream modules can be pushed onto the stream, using
ioctl( ). These modules may filter and/or modify the data as it passes
through the stream.
 Each module has a read queue and a write queue.
 Flow control can be optionally supported, in which case each module
will buffer data until the adjacent module is ready to receive it. Without
flow control, data is passed along as soon as it is ready.
 User processes communicate with the stream head using either read( )
and write( ) ( or putmsg( ) and getmsg( ) for message passing. )
 Streams I/O is asynchronous ( non-blocking ), except for the interface
between the user process and the stream head.
 The device driver must respond to interrupts from its device - If the
adjacent module is not prepared to accept data and the device driver's
buffers are all full, then data is typically dropped.
 Streams are widely used in UNIX, and are the preferred approach for
device drivers. For example, UNIX implements sockets using streams.

Disk Management:
Disk Structure:

Step wise description of Disk Structure is given below


 Disk surfacе dividеd into tracks
 A rеad/writе hеad positionеd just abovе thе disk surfacе
 Information storеd by magnеtic rеcording on thе track undеr rеad/writе hеad
 Fixеd hеad disk
 Moving hеad disk
 Dеsignеd for largе amount of storagе
 Primary dеsign considеration cost, sizе, and spееd
Hardwarе for disk systеm
 Disk drivе, Dеvicе motor, Rеad/writе hеad, Associatеd logic
Disk controllеr
 Dеtеrminеs thе logical intеraction with thе computеr
 Can sеrvicе morе than onе drivе (ovеrlappеd sееks)
Cylindеr
 Thе samе numbеrеd tracks on all thе disk surfacеs
 Еach track contains bеtwееn 8 to 32 sеctors
Sеctor
 Smallеst unit of information that can bе rеad from/writtеn into disk
 Rangе from 32 bytеs to 4096 bytеs

Disk Attachment:

Types of disk attachment

 Internal disk attachment


 External disk attachment
 Network Attached Storage
 Storage Area Network

Internal disk attachment

Definition

Internal disk attachment refers to the process of connecting a storage


device directly to the motherboard of a computer system. This type of
attachment is typically used for storage devices that are intended to be
permanent components of the computer system, such as the primary
hard disk drive.

Advantages
 Faster data transfer speeds − Internal disk attachment provides faster data
transfer speeds compared to external attachment methods, such as USB or
FireWire.
 Better power management − Internal storage devices can be more easily
managed by the operating system's power management features, allowing for
more efficient power usage.
 More secure − Since internal storage devices are physically connected to the
motherboard, they are less likely to be accidentally disconnected or removed.
Disadvantages
 Limited expansion − Internal disk attachment limits the number of storage
devices that can be connected to a computer system. This can be problematic for
users who require a large amount of storage space.
 The difficulty of access − Since internal storage devices are located inside the
computer system, accessing them for upgrades or repairs can be more difficult
and time-consuming.
 Higher cost − Internal storage devices can be more expensive than external
devices due to their higher performance and reliability requirements.

External disk attachment

Definition

External disk attachment refers to the process of connecting a storage


device to a computer system via an external port, such as USB,
Thunderbolt, or FireWire. This type of attachment is typically used for
storage devices that are intended to be portable, such as external hard
drives or USB flash drives.

Advantages
 Portability − External storage devices can be easily transported and used on
multiple computer systems, making them ideal for users who require access to
their data on the go.
 Ease of access − External storage devices are located outside the computer
system, making them easy to access for upgrades or repairs.
 Expandability − External storage devices can be easily added or removed from a
computer system, allowing for more storage space as needed.
Disadvantages
 Slower data transfer speeds − External disk attachments typically provide
slower data transfer speeds compared to internal attachment methods, such as
SATA.
 Limited power management − External storage devices may not be as easily
managed by the operating system's power management features, leading to less
efficient power usage.
 Less secure − External storage devices can be accidentally disconnected or
removed, leading to potential data loss or corruption.

Network Attached Storage

Definition

Network-attached storage (NAS) is a type of storage architecture where


storage devices are connected to a network and provide file-level access
to multiple clients or users. NAS devices are typically dedicated devices
that contain one or more hard drives or solid-state drives, and they are
connected to the network using standard Ethernet or Wi-Fi connections.

Advantages
 Easy to set up and manage − NAS devices are designed to be user-friendly, and
they can be easily configured and managed using a web-based interface.
 Cost-effective − NAS devices are typically less expensive than other storage
architectures, such as Storage Area Networks (SANs), and they can offer high-
capacity storage for a relatively low cost.
 Centralized storage − NAS devices provide a centralized storage location that
can be accessed by multiple users or devices on the network, which can be useful
for sharing files and backing up data.
Disadvantages
 Limited performance − NAS devices may not offer the same level of
performance as other storage architectures, such as SANs, especially for high-
performance applications.
 Limited scalability − NAS devices may be limited in terms of scalability,
especially for larger enterprise environments.
 Network dependency − NAS devices rely on network connectivity, which can be
a potential point of failure or a bottleneck for storage access.
Storage Area Network

Definition

A Storage Area Network (SAN) is a specialized network that provides


block-level access to storage devices, such as hard disk drives (HDDs),
solid-state drives (SSDs), or tape libraries. SANs are designed to provide
high-speed, low-latency storage access for servers or hosts, and they can
be used to build complex storage infrastructures for enterprise data
centers.

Advantage

SANs offer several advantages over other storage architectures. They can
provide high-speed, lowlatency access to storage devices, which can be
critical for high-performance applications such as databases or virtualized
environments.

Disadvantage

SANs can also be complex and expensive to implement and maintain, and
they may require specialized skills and expertise to configure and
manage. They also require a dedicated network infrastructure, which can
add to the overall cost and complexity of the storage infrastructure.

Disk attachment methods


SATA

Definition

Serial ATA (SATA) is a standard for connecting storage devices to a


computer system. SATA uses a serial connection and is commonly used
for connecting internal hard disk drives and solid-state drives.

Advantages
 Faster data transfer speeds − SATA provides faster data transfer speeds
compared to older parallel ATA (PATA) standards.
 Higher storage capacity − SATA supports larger storage devices than PATA,
allowing for more data to be stored on a single device.
Disadvantages
 Limited cable length − SATA cables are limited in length, which can be
problematic for larger computer systems.
 The limited number of devices − SATA only supports a limited number of
devices per controller, which can be problematic for users who require a large
amount of storage space.

SCSI

Definition

Small Computer System Interface (SCSI) is a standard for connecting


storage devices to a computer system. SCSI uses a parallel connection
and is commonly used for connecting highperformance storage devices,
such as hard disk drives and solid-state drives.

Advantages
 High data transfer speeds − SCSI provides high data transfer speeds compared
to older standards, such as PATA.
 Support for multiple devices − SCSI supports a large number of devices per
controller, making it ideal for users who require a large amount of storage space.
Disadvantages
 Higher cost − SCSI devices can be more expensive than other attachment
methods due to their higher performance and reliability requirements.
 Limited compatibility − SCSI devices may not be compatible with all computer
systems, which can be problematic for users who require a high-performance
storage solution.
SAS

Definition

Serial Attached SCSI (SAS) is a standard for connecting storage devices


to a computer system. SAS uses a serial connection and is commonly
used for connecting high-performance storage devices, such as hard disk
drives and solid-state drives.

Advantages
 High data transfer speeds − provides high data transfer speeds compared to
older standards, such as PATA.
 Support for multiple devices − SAS supports a large number of devices per
controller, making it ideal for users who require a large amount of storage space.
Disadvantages
 Higher cost − SAS devices can be more expensive than other attachment
methods due to their higher performance and reliability requirements.
 Limited compatibility − SAS devices may not be compatible with all computer
systems, which can be problematic for users who require a high-performance
storage solution.

Importance of Disk Attachment in OS

 Data storage − Disk attachment is necessary for storing data on a computer


system. Without disk attachments, it would be impossible to save files or install
software on the computer.
 Performance − Disk attachment plays a critical role in system performance.
Faster and more efficient disk attachment technologies, such as Serial Attached
SCSI (SAS), can improve the speed and responsiveness of the system.
 Scalability − As data storage needs increase, disk attachment technologies
provide scalability by allowing additional disks to be added to the system. This
can be particularly important for businesses and organizations that need to store
large amounts of data.
 Redundancy − Disk attachment technologies can provide redundancy and
failover capabilities to ensure that data remains accessible even in the event of
disk failure.
 Data protection − Disk attachment technologies can provide data protection
features such as RAID (Redundant Array of Independent Disks) to protect
against data loss due to disk failure.
Disk Shcheduling:
Disc scheduling is an important process in operating systems that determines
the order in which disk access requests are serviced. The objective of disc
scheduling is to minimize the time it takes to access data on the disk and to
minimize the time it takes to complete a disk access request. Disc scheduling
is an important process in operating systems that determines the order in
which disk access requests are serviced. The objective of disc scheduling is to
minimize the time it takes to access data on the disk and to minimize the
time it takes to complete a disk access request.

Disk access time is determined by two factors:

seek time : Seek time is the time it takes for the disk head to move to the
desired location on the disk.

rotational latency: rotational latency is the time taken by the disk to rotate
the desired data sector under the disk head.

First-Come-First-Serve
The First-Come-First-Served (FCFS) disk scheduling algorithm is one of the
simplest and most straightforward disk scheduling algorithms used in modern
operating systems. It operates on the principle of servicing disk access
requests in the order in which they are received. In the FCFS algorithm, the
disk head is positioned at the first request in the queue and the request is
serviced. The disk head then moves to the next request in the queue and
services that request. This process continues until all requests have been
serviced.

Example

Suppose we have an order of disk access requests: 20 150 90 70 30 60. The


disk head is −
currently located at track 50.

The total seek time = (50-20) + (150-20) + (150-90) + (90-70) + (70-30)


+ (60-30) = 310

Shortest-Seek-Time-First
Shortest Seek Time First (SSTF) is a disk scheduling algorithm used in
operating systems to efficiently manage disk I/O operations. The goal of
SSTF is to minimize the total seek time required to service all the disk access
requests. In SSTF, the disk head moves to the request with the shortest seek
time from its current position, services it, and then repeats this process until
all requests have been serviced. The algorithm prioritizes disk access
requests based on their proximity to the current position of the disk head,
ensuring that the disk head moves the shortest possible distance to service
each request.
Example

In this case, for the same order of success request, the total seek time =
(60-50) + (70-60) + (90-70) + (90-30) + (30-20) + (150-20) = 240

SCAN
SCAN (Scanning) is a disk scheduling algorithm used in operating systems to
manage disk I/O operations. The SCAN algorithm moves the disk head in a
single direction and services all requests until it reaches the end of the disk,
and then it reverses direction and services all the remaining requests. In
SCAN, the disk head starts at one end of the disk, moves toward the other
end, and services all requests that lie in its path. Once the disk head reaches
the other end, it reverses direction and services all requests that it missed on
the way. This continues until all requests have been serviced.
Example

If we consider that the head direction is left in case of SCAN, the total seek
time = (50-30) + (30-20) + (20-0) + (60-0) + (60-70) + (90-70) + (90-
150) = 200

C-SCAN
The C-SCAN (Circular SCAN) algorithm operates similarly to the SCAN
algorithm, but it does not reverse direction at the end of the disk. Instead,
the disk head wraps around to the other end of the disk and continues to
service requests. This algorithm can reduce the total distance the disk head
must travel, improving disk access time. However, this algorithm can lead to
long wait times for requests that are made near the end of the disk, as they
must wait for the disk head to wrap around to the other end of the disk
before they can be serviced. The C-SCAN algorithm is often used in modern
operating systems due to its ability to reduce disk access time and improve
overall system performance.
Example

For C-SCAN, the total seek time = (60-50) + (70-60) + (90-70) + (150-90)
+ (199-150) + (199-0) + (20-0) + (30-20) = 378

LOOK
The LOOK algorithm is similar to the SCAN algorithm but stops servicing
requests as soon as it reaches the end of the disk. This algorithm can reduce
the total distance the disk head must travel, improving disk access time.
However, this algorithm can lead to long wait times for requests that are
made near the end of the disk, as they must wait for the disk head to wrap
around to the other end of the disk before they can be serviced. The LOOK
algorithm is often used in modern operating systems due to its ability to
reduce disk access time and improve overall system performance.

Example

Considering the head direction is right, in this case, the total seek time =
(60-50) + (70-60) + (90-70) + (150-90) + (150-30) + (30-20) = 230
C-LOOK
C-LOOK is similar to the C-SCAN disk scheduling algorithm. In this algorithm,
goes only to the last request to be serviced in front of the head in spite of
the disc arm going to the end, and then from there it goes to the other end’s
last request. Thus, it also prevents the extra delay which might occur due to
unnecessary traversal to the end of the disk.
Example

For the C-LOOK algorithm, the total seek time = (60-50) + (70-60) + (90-
70) + (150-90) + (150-20) + (30-20) = 240

Disk Management:

The operating system is responsible for various operations of disk management.


Modern operating systems are constantly growing their range of services and add-
ons, and all operating systems implement four essential operating system
administration functions. These functions are as follows:

1. Process Management
2. Memory Management
3. File and Disk Management
4. I/O System Management
Disk Management of the OS includes the various aspects, such as:

1. Disk Formatting
A new magnetic disk is mainly a blank slate. It is platters of the magnetic
recording material. Before a disk may hold data, it must be partitioned into
sectors that may be read and written by the disk controller. It is known
as physical formatting and low-level formatting.

Low-level formatting creates a unique data structure for every sector on the
drive. A data structure for a sector is made up of a header, a data region, and
a trailer. The disk controller uses the header and trailer to store information
like an error-correcting code (ECC) and a sector number.

The OS must require recording its own data structures on the disk drive to
utilize it as a storage medium for files. It accomplishes this in two phases. The
initial step is to divide the disk drive into one or more cylinder groups. The OS
may treat every partition as it were a separate disk. For example, one partition
could contain a copy of the OS executable code, while another could contain
user files. The second stage after partitioning is logical formatting. The
operating store stores the initial file system data structure on the disk drive in
this second stage.

2. Boot Block
When a system is turned on or restarted, it must execute an initial program.
The start program of the system is called the bootstrap program. It starts the
OS after initializing all components of the system. The bootstrap program
works by looking for the OS kernel on disk, loading it into memory, and
jumping to an initial address to start the OS execution.

The bootstrap is usually kept in read-only memory on most computer systems.


It is useful since read-only memory does not require initialization and is at a fixed
location where the CPU may begin executing whether powered on or reset.
Furthermore, it may not be affected by a computer system virus because ROM is
read-only. The issue is that updating this bootstrap code needs replacing the
ROM hardware chips.

As a result, most computer systems include small bootstrap loader software in


the boot ROM, whose primary function is to load a full bootstrap program
from a disk drive. The entire bootstrap program can be modified easily, and
the disk is rewritten with a fresh version. The bootstrap program is stored in a
partition and is referred to as the boot block. A boot disk or system disk is a
type of disk that contains a boot partition.
3. Bad Blocks
Disks are prone to failure due to their moving parts and tight tolerances.
When a disk drive fails, it must be replaced and the contents transferred to the
replacement disk using backup media. For some time, one or more sectors
become faulty. Most disks also come from the company with bad blocks.
These blocks are handled in various ways, depending on the use of disk and
controller.

Swap-Space Management:
Swap-space management is another low-level task of the operating system. Virtual memory
uses disk space as an extension of main memory. Since disk access is much slower than
memory access, using swap space significantly decreases system performance. The main goal
for the design and implementation of swap space is to provide the best throughput for the
virtual memory system.

Uses of Swap Space


The different operating system uses Swap-space in various ways. The systems that
are implementing swapping may use swap space to hold the entire process,
including image, code, and data segments.

o Swapping is a memory management technique used in multi-programming to


increase the number of processes sharing the CPU. It is a technique of removing a
process from the main memory, storing it into secondary memory, and then bringing
it back into the main memory for continued execution. This action of moving a
process out from main memory to secondary memory is called Swap Out. The action
of moving a process out from secondary memory to main memory is called Swap In.
o Paging systems may simply store pages that have been pushed out of the main
memory. The need for swap space on a system can vary from megabytes to
gigabytes. Still, it also depends on the amount of physical memory, the virtual
memory it is backing, and how it uses the virtual memory.

It is safer to overestimate than to underestimate the amount of swap space required


because if a system runs out of swap space, it may be forced to abort the processes
or may crash entirely. Overestimation wastes disk space that could be used for files,
but it does not harm others.
The following table shows different systems using the amount of swap space:

S.No. System Swap Space

1. Solaris Swap space is equal to the amount of physical memory.

2. Linux Swap space is double the amount of physical memory.

RAID Structure:
RAID is a technique that makes use of a combination of multiple disks instead of
using a single disk for increased performance, data redundancy, or both. The term
was coined by David Patterson, Garth A. Gibson, and Randy Katz at the University
of California, Berkeley in 1987.

Key Evaluation Points for a RAID System


 Reliability: How many disk faults can the system tolerate?
 Availability: What fraction of the total session time is a system in uptime
mode, i.e. how available is the system for actual use?
 Performance: How good is the response time? How high is the
throughput (rate of processing work)? Note that performance contains a lot
of parameters and not just the two.
 Capacity: Given a set of N disks each with B blocks, how much useful
capacity is available to the user?
RAID is very transparent to the underlying system. This means, that to the host
system, it appears as a single big disk presenting itself as a linear array of blocks.
This allows older technologies to be replaced by RAID without making too many
changes to the existing code.
Advantages of RAID
 Data redundancy: By keeping numerous copies of the data on many
disks, RAID can shield data from disk failures.
 Performance enhancement: RAID can enhance performance by
distributing data over several drives, enabling the simultaneous execution
of several read/write operations.
 Scalability: RAID is scalable, therefore by adding more disks to the array,
the storage capacity may be expanded.
 Versatility: RAID is applicable to a wide range of devices, such as
workstations, servers, and personal PCs
Disadvantages of RAID
 Cost: RAID implementation can be costly, particularly for arrays with
large capacities.
 Complexity: The setup and management of RAID might be challenging.
 Decreased performance: The parity calculations necessary for some
RAID configurations, including RAID 5 and RAID 6, may result in a
decrease in speed.
 Single point of failure: RAID is not a comprehensive backup solution,
while offering data redundancy. The array’s whole contents could be lost if
the RAID controller malfunctions.
Different RAID Levels

1. RAID-0 (Stripping)
 Blocks are “stripped” across disks.

RAID-0

 In the figure, blocks “0,1,2,3” form a stripe.


 Instead of placing just one block into a disk at a time, we can work with
two (or more) blocks placed into a disk before moving on to the next one.

Raid-0
Evaluation
 Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be
recovered.
 Capacity: N*B
The entire space is being used to store data. Since there is no duplication,
N disks each having B blocks are fully utilized.
Advantages
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.
2. RAID-1 (Mirroring)
 More than one copy of each block is stored in a separate disk. Thus, every
block has two (or more) copies, lying on different disks.

Raid-1

 The above figure shows a RAID-1 system with mirroring level 2.


 RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of
reliability.
Evaluation
Assume a RAID system with mirroring level 2.
 Reliability: 1 to N/2
1 disk failure can be handled for certain because blocks of that disk would
have duplicates on some other disk. If we are lucky enough and disks 0
and 2 fail, then again this can be handled as the blocks of these disks have
duplicates on disks 1 and 3. So, in the best case, N/2 disk failures can be
handled.
 Capacity: N*B/2
Only half the space is being used to store data. The other half is just a
mirror of the already stored data.
Advantages
1. It covers complete redundancy.
2. It can increase data security and speed.
Disadvantages
1. It is highly expensive.
2. Storage capacity is less.
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
 In Raid-2, the error of the data is checked at every bit level. Here, we
use Hamming Code Parity Method to find the error in the data.
 It uses one designated drive to store parity.
 The structure of Raid-2 is very complex as we use two disks in this
technique. One word is used to store bits of each word and another word is
used to store error code correction.
 It is not commonly used.
Advantages
1. In case of Error Correction, it uses hamming code.
2. It Uses one designated drive to store parity.
Disadvantages
1. It has a complex structure and high cost due to extra drive.
2. It requires an extra drive for error detection.
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
 It consists of byte-level striping with dedicated parity striping.
 At this level, we store parity information in a disc section and write to a
dedicated parity drive.
 Whenever failure of the drive occurs, it helps in accessing the parity drive,
through which we can reconstruct the data.

Raid-3

 Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data
loss occurs, we can construct it with Disk 3.
Advantages
1. Data can be transferred in bulk.
2. Data can be accessed in parallel.
Disadvantages
1. It requires an additional drive for parity.
2. In the case of small-size files, it performs slowly.
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
 Instead of duplicating data, this adopts a parity-based approach.
Raid-4

 In the figure, we can observe one column (disk) dedicated to parity.


 Parity is calculated using a simple XOR function. If the data bits are
0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the
parity bit is XOR(0,1,1,0) = 0. A simple approach is that an even number
of ones results in parity 0, and an odd number of ones results in parity 1.

Raid-4

 Assume that in the above figure, C3 is lost due to some disk failure. Then,
we can recompute the data bit stored in C3 by looking at the values of all
the other columns and the parity bit. This allows us to recover lost data.
Evaluation
 Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way
parity works). If more than one disk fails, there is no way to recover the
data.
 Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1)
disks are made available for data storage, each disk having B blocks.
Advantages
1. It helps in reconstructing the data if at most one data is lost.
Disadvantages
1. It can’t help in reconstructing when more than one data is lost.
6. RAID-5 (Block-Level Stripping with Distributed Parity)
 This is a slight modification of the RAID-4 system where the only
difference is that the parity rotates among the drives.
Raid-5

 In the figure, we can notice how the parity bit “rotates”.


 This was introduced to make the random write performance better.
Evaluation
 Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way
parity works). If more than one disk fails, there is no way to recover the
data. This is identical to RAID-4.
 Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity.
Hence, (N-1) disks are made available for data storage, each disk having B
blocks.
Advantages
1. Data can be reconstructed using parity bits.
2. It makes the performance better.
Disadvantages
1. Its technology is complex and extra space is required.
2. If both discs get damaged, data will be lost forever.
7. RAID-6 (Block-Level Stripping with two Parity Bits)
 Raid-6 helps when there is more than one disk failure. A pair of
independent parities are generated and stored on multiple disks at this
level. Ideally, you need four disk drives for this level.
 There are also hybrid RAIDs, which make use of more than one RAID
level nested one after the other, to fulfill specific requirements.

Raid-6

Advantages
1. Very high data Accessibility.
2. Fast read data transactions.
Disadvantages
1. Due to double parity, it has slow write data transactions.
2. Extra space is required.
Stable Storage Implementation:
To achieve such storage, we need to replicate the required information on multiple
storage devices with independent failure modes. The writing of an update should be
coordinate in such a way that it would not delete all the copies of the state and when
we are recovering from a failure we can force all the copies to a consistent and
correct valued even if another failure occurs during the recovery.

1. Successful completion –
The data will be written correctly on the disk.
2. Partial Failure –
In this case, failure has occurred in the middle of the data transfer, such
that only some sectors were written with the new data, and the sectors
which were written during the failure may have been corrupted.
3. Total Failure –
The failure occurred before the disk write started, so the previous data
values on the disk remains intact.
1. Write the information onto the first physical block.
2. When the first write completes successfully, perform the same operation
onto the second physical block.
3. When both the operations are successful, declare the operation as
complete.

Tertiary-Storage Structure:
Tertiary storage units are widely employed for offsite storage or for the long-term
retention of volumes of data that are rarely accessed. Tape libraries, optical
jukeboxes, and cloud storage are a few examples of tertiary storage systems. Data is
kept on magnetic tapes, which are affordable, and long-lasting, but slower to access
than other forms of storage, in tape libraries. In general, optical jukeboxes are faster
than tape libraries but have a shorter lifespan since they store data on optical discs
like CDs or DVDs.

Features
 Low cost: Because tertiary storage is intended for rarely accessed data and
does not have to be as quick or dependable, it is typically less expensive
than primary and secondary storage.
 Large storage capacity: Tertiary storage devices are made to hold a lot of
data, usually between terabytes and petabytes.
 Offsite storage: Tertiary storage systems are frequently used for offsite
storage, which can add security and safeguard against data loss due to
disasters or other unforeseen circumstances.
 Slow access: Tertiary storage is not designed for frequent use, hence it
often accesses more slowly than main and secondary storage.
 Storage for the long term: Tertiary storage is frequently used to store
data for the long term that is not in use but must be kept for regulatory or
compliance reasons, or for data archiving.
 Data backup and recovery: Tertiary storage is frequently used for data
backup and recovery because it offers an affordable and dependable way
to store data that might be required in the event of data loss or corruption.
 Large storage capacity: Tertiary storage offers significantly larger
storage capacity compared to primary and secondary storage, making it
ideal for storing large amounts of data that may not fit in primary or
secondary storage.
 Cost-effective: Tertiary storage is typically more cost-effective than
primary and secondary storage, as it is designed for large-scale data
storage and is available in high-capacity devices.
 Easy accessibility: With tertiary storage, data can be easily accessed and
retrieved as needed, even if it is not currently being used. This is because
tertiary storage operates at a slower speed than primary and secondary
storage.
 Improved data backup and recovery: Tertiary storage provides a
convenient backup solution for critical data and enables easy data
recovery in case of a failure or data loss in primary or secondary storage.
 Long-term data preservation: Tertiary storage is designed for long-term
data preservation, making it ideal for archiving data that is not frequently
used but must be kept for regulatory or historical purposes.
 Scalability: Tertiary storage can be easily scaled up or down to meet
changing storage requirements, making it a flexible and adaptable solution
for organizations of any size.
Applications
 Backup and Recovery: Tertiary storage is commonly used to store
backups of critical data to protect against data loss due to hardware failure
or other forms of data corruption.
 Archiving: Tertiary storage can be used to store large amounts of
historical data that is not frequently accessed but still needs to be
preserved for regulatory, legal, or business reasons.
 Digital Preservation: Tertiary storage is used to store and preserve
valuable digital content such as historical documents, audio and video
recordings, and photographs.
 Big Data Analytics: Tertiary storage systems can store large amounts of
raw data that can be processed and analyzed for insights and decision-
making.
 Cloud Storage: Tertiary storage is a component of cloud storage
solutions, where data is stored remotely and accessed over the internet.
 Data Warehouses: Tertiary storage is used to store large amounts of
structured data for business intelligence and data analysis.
 Data Lakes: Tertiary storage is used to store raw and unstructured data
for later processing and analysis.
Limitations
 Data saved on tertiary storage is not always accessible because retrieving
data from tertiary storage takes longer than from primary or secondary
storage.
 Tertiary storage is not designed for regular use, hence it often takes longer
to access than main and secondary storage.
 Data kept on tertiary storage may be hard to access because it may be
stored offsite and require specialist equipment, which can make it harder
to recover data quickly.
 Since it often necessitates the use of off-site storage facilities and
specialist technology, retrieving data from tertiary storage can be costly.
 Data loss due to physical deterioration or other problems may occur in
tertiary storage devices like tape libraries because of their limited lifespan.
 Because tertiary storage is not designed for active users and may not have
the same level of protection against data loss or corruption as primary and
secondary storage, it may not offer the same level of data security as those
two storage types.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy