Chapter-3 Msc-cs-1
Chapter-3 Msc-cs-1
File attributes:
For the convenience of human users, each file is given a specific name and therefore, a file is
referred to by its name. A name is usually a string of characters, e.g., example.doc. Once a
file is named, it becomes independent of the process, the user, and even the system that
created it. For example, one user might create a file example.doc and another user might edit
that file by specifying its name. The file owner might write the same file to a USB disk, copy
it, send it across a network and it could be still called example.doc on the destination
system.File attributes vary from one operating system to another but typically a file consists:
Name: the symbolic file name is the only information kept in human readable form.
Identifier: this unique tag, usually a number, identifies the file within the file system;
it’sa non-human readable name of file.
Type: this information is needed for systems that support different types of files.
Location: this information is a pointer to a device and to the location of the file on that
device.
Size: the current size of the file (in bytes, words, or blocks) and possibly the maximum
allowed size are included in this attribute.
Protection: access-control information determines who can do reading, writing, executing,
and so on.
Time, date and user identification: this information may be kept for creation, last
modification and last use. These data can be useful for protection, security and usage
monitoring.
File Operations:
Any file system provides not only a means to store data organized as files, but a collection of
functions that can be performed on files. Typical operations include:
Create: A new file is defined and positioned within the file system. Two necessary steps
involved for this operation. First, a space in the file system must be found for the file.
Second, an entry for the new file must be made in the directory.
Write: a process updates a file, either by adding new data that expands the size of the file
or by changing the values of existing data items in the file. Usually a system call is made
specifying both the name of file and the information to be written to that file.
Read: a process reads all or portion of the data in a file. To read from a file a system call is
made specifying the name of file and where (in memory) the next block of the file should be
put.
Delete: A file is removed from the file structure and destroyed. Here the directory is searched
for the named file. Once found, all file space is released so that other files can reuse the
space. Lastly, the directory entry of that file is erased
Reposition: this involves moving a file from one directory to another or a different location
in the storage device. Usually, the directory is searched for the appropriate entry and the
current file position pointer is repositioned to a given value.
Truncate: The user may want to erase the contents of a file but keep its attributes. Ratherthan
deleting the file and recreating it afresh, this operation allows all attributes to
remainunchanged – except for file length – but lets the file be reset to length zero and its file
space released.
When the information stored in file is used, that information must be accessed and read into
computer memory. There are several ways for accessing this information and choosing the
right method for a particular application poses the major design problem
Sequential access:
This is the simplest access method in which, information in the file is processed in order, one
record after the other. This method of access is by far the most common. For example, editors
and compilers usually access files in this fashion
Direct access:
This method is also referred to as relative access. In this method, a file is made up of fixed-
length logical records that allow programs to read and write records rapidly in no particular
order. A file is viewed as a numbered sequence of blocks or records. As a result, there are no
restrictions on the order of reading or writing for this method of access. Direct access files are
of great use for immediate access to large amounts of information. A good example is a
database in which, when a query concerning a particular subject arrives, the block containing
the answer is computed and then that block is read directly to provide the desired
information.
1. File Creation
2. Search for the file
3. File deletion
4. Renaming the file
5. Traversing Files
6. Listing of files
The simplest method is to have one big list of all the files on the disk. The entire system will
contain only one directory which is supposed to mention all the files present in the file
system. The directory contains one entry per each file present on the file system.
Advantages
Disadvantages
In two level directory systems, we can create a separate directory for each user. There is one
master directory which contains separate directories dedicated to each user. For each user,
there is a different directory present at the second level, containing group of user's file. The
system doesn't let a user to enter in the other user's directory without permission.
Every Operating System maintains a variable as PWD which contains the present directory
name (present user name) so that the searching can be done appropriately.
Advantages:
This directory structure allows subdirectories inside a directory.
The searching is easier.
File sorting of important and unimportant becomes easier.
This directory is more scalable than the other two directory structures
explained.
Disadvantages:
As the user isn’t allowed to access other user’s directory, this prevents the file
sharing among users.
As the user has the capability to make subdirectories, if the number of
subdirectories increase the searching may become complicated.
Users cannot modify the root directory data.
If files do not fit in one, they might have to be fit into other directories.
Acyclic Graph Structure:
where none of them have the capability to access one file from multiple directories. The file
or the subdirectory could be accessed through the directory it was present in, but not from
the other directory.
This problem is solved in acyclic graph directory structure, where a file in one directory can
be accessed from multiple directories. In this way, the files could be shared in between the
users. It is designed in a way that multiple directories point to a particular directory or file
with the help of links.
In the below figure, this explanation can be nicely observed, where a file is shared between
multiple users. If any user makes a change, it would be reflected to both the users.
File Sharing:
Definition of file sharing
File sharing refers to the process of sharing or distributing electronic files such as documents,
music, videos, images, and software between two or more users or computers.
File sharing plays a vital role in facilitating collaboration and communication among
individuals and organizations. It allows people to share files quickly and easily across
different locations, reducing the need for physical meetings and enabling remote work. File
sharing also helps individuals and organizations save time and money, as it eliminates the
need for physical transportation of files.
File sharing can pose several risks and challenges, including the spread of malware and
viruses, data breaches and leaks, legal consequences, and identity theft. Unauthorized access
to sensitive files can also result in loss of intellectual property, financial losses, and
reputational damage.
With the increase in cyber threats and the sensitive nature of the files being shared, it is
essential to implement adequate file protection measures to secure the files from unauthorized
access, theft, and cyberattacks. Effective file protection measures can help prevent data
breaches and other cyber incidents, safeguard intellectual property, and maintain business
continuity.
Types of File Sharing
File sharing refers to the practice of distributing or providing access to digital files, such as
documents, images, audio, and video files, between two or more users or devices. There are
several types of file sharing methods available, and each method has its own unique
advantages and disadvantages.
Peer-to-Peer (P2P) File Sharing − Peer-to-peer file sharing allows users to share
files with each other without the need for a centralized server. Instead, users connect
to each other directly and exchange files through a network of peers. P2P file sharing
is commonly used for sharing large files such as movies, music, and software.
Cloud-Based File Sharing − Cloud-based file sharing involves the storage of files in
a remote server, which can be accessed from any device with an internet connection.
Users can upload and download files from cloud-based file sharing services such as
Google Drive, Dropbox, and OneDrive. Cloud-based file sharing allows users to
easily share files with others, collaborate on documents, and access files from
anywhere.
Direct File Transfer − Direct file transfer involves the transfer of files between two
devices through a direct connection such as Bluetooth or Wi-Fi Direct. Direct file
transfer is commonly used for sharing files between mobile devices or laptops.
Removable Media File Sharing − Removable media file sharing involves the use of
physical storage devices such as USB drives or external hard drives. Users can copy
files onto the device and share them with others by physically passing the device to
them.
Each type of file sharing method comes with its own set of risks and challenges. Peer-to-peer
file sharing can expose users to malware and viruses, while cloud-based file sharing can lead
to data breaches if security measures are not implemented properly. Direct file transfer and
removable media file sharing can also lead to data breaches if devices are lost or stolen.
To protect against these risks, users should take precautions such as using encryption,
password protection, secure file transfer protocols, and regularly updating antivirus and
antimalware software. It is also essential to educate users on safe file sharing practices and
limit access to files only to authorized individuals or groups. By taking these steps, users can
ensure that their files remain secure and protected during file sharing.
File sharing is a convenient and efficient way to share information and collaborate on
projects. However, it comes with several risks and challenges that can compromise the
confidentiality, integrity, and availability of files. In this section, we will explore some of the
most significant risks of file sharing.
Malware and Viruses − One of the most significant risks of file sharing is the spread
of malware and viruses. Files obtained from untrusted sources, such as peer-to-peer
(P2P) networks, can contain malware that can infect the user's device and compromise
the security of their files. Malware and viruses can cause damage to the user's device,
steal personal information, or even use their device for illegal activities without their
knowledge.
Data Breaches and Leaks − Another significant risk of file sharing is the possibility
of data breaches and leaks. Cloud-based file sharing services and P2P networks are
particularly vulnerable to data breaches if security measures are not implemented
properly. Data breaches can result in the loss of sensitive information, such as
personal data or intellectual property, which can have severe consequences for both
individuals and organizations.
Legal Consequences − File sharing copyrighted material without permission can lead
to legal consequences. Sharing copyrighted music, movies, or software can result in
copyright infringement lawsuits and hefty fines.
Identity Theft − File sharing can also expose users to identity theft. Personal
information, such as login credentials or social security numbers, can be inadvertently
shared through file sharing if security measures are not implemented properly.
Cybercriminals can use this information to commit identity theft, which can have
severe consequences for the victim.
To protect against these risks, users should take precautions such as using trusted sources for
file sharing, limiting access to files, educating users on safe file sharing practices, and
regularly updating antivirus and anti-malware software. By taking these steps, users can
reduce the risk of malware and viruses, data breaches and leaks, legal consequences, and
identity theft during file sharing.
Encryption − Encryption is the process of converting data into a coded language that
can only be accessed by authorized users with a decryption key. This can help protect
files from unauthorized access and ensure that data remains confidential even if it is
intercepted during file sharing.
Password protection − Password protection involves securing files with a password
that must be entered before the file can be accessed. This can help prevent
unauthorized access to files and ensure that only authorized users can view or modify
the files.
Secure file transfer protocols − Secure file transfer protocols, such as SFTP (Secure
File Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure), provide a
secure way to transfer files over the internet. These protocols use encryption and other
security measures to protect files from interception and unauthorized access during
transfer.
Firewall protection − Firewall protection involves using a firewall to monitor and
control network traffic to prevent unauthorized access to the user's device or network.
Firewalls can also be configured to block specific file sharing protocols or limit
access to certain users or devices, providing an additional layer of protection for
shared files.
File Protection:
File protection in an operating system refers to the various mechanisms and techniques used
to secure files from unauthorized access, alteration, or deletion. It involves controlling access
to files, ensuring their security and confidentiality, and preventing data breaches and other
security incidents.
File Permissions − File permissions are a basic form of file protection that controls
access to files by setting permissions for users and groups. File permissions allow the
system administrator to assign specific access rights to users and groups, which can
include read, write, and execute privileges. These access rights can be assigned at the
file or directory level, allowing users and groups to access specific files or directories
as needed. File permissions can be modified by the system administrator at any time
to adjust access privileges, which helps to prevent unauthorized access.
Encryption − Encryption is the process of converting plain text into ciphertext to
protect files from unauthorized access. Encrypted files can only be accessed by
authorized users who have the correct encryption key to decrypt them. Encryption is
widely used to secure sensitive data such as financial information, personal data, and
other confidential information. In an operating system, encryption can be applied to
individual files or entire directories, providing an extra layer of protection against
unauthorized access.
Access Control Lists (ACLs) − Access control lists (ACLs) are lists of permissions
attached to files and directories that define which users or groups have access to them
and what actions they can perform on them. ACLs can be more granular than file
permissions, allowing the system administrator to specify exactly which users or
groups can access specific files or directories. ACLs can also be used to grant or deny
specific permissions, such as read, write, or execute privileges, to individual users or
groups.
Auditing and Logging − Auditing and logging are mechanisms used to track and
monitor file access, changes, and deletions. It involves creating a record of all file
access and changes, including who accessed the file, what actions were performed,
and when they were performed. Auditing and logging can help to detect and prevent
unauthorized access and can also provide an audit trail for compliance purposes.
Physical File Security − Physical file security involves protecting files from physical
damage or theft. It includes measures such as file storage and access control, backup
and recovery, and physical security best practices. Physical file security is essential
for ensuring the integrity and availability of critical data, as well as compliance with
regulatory requirements
Data Security − File protection mechanisms such as encryption, access control lists,
and file permissions provide robust data security by preventing unauthorized access to
files. These mechanisms ensure that only authorized users can access files, which
helps to prevent data breaches and other security incidents. Data security is critical for
organizations that handle sensitive data such as personal data, financial information,
and intellectual property.
Compliance − File protection mechanisms are essential for compliance with
regulatory requirements such as GDPR, HIPAA, and PCI-DSS. These regulations
require organizations to implement appropriate security measures to protect sensitive
data from unauthorized access, alteration, or deletion. Failure to comply with these
regulations can result in significant financial penalties and reputational damage.
Business Continuity − File protection mechanisms are essential for ensuring business
continuity by preventing data loss due to accidental or malicious deletion, corruption,
or other types of damage. File protection mechanisms such as backup and recovery,
auditing, and logging can help to recover data quickly in the event of a data loss
incident, ensuring that business operations can resume as quickly as possible.
Increased Productivity − File protection mechanisms can help to increase
productivity by ensuring that files are available to authorized users when they need
them. By preventing unauthorized access, alteration, or deletion of files, file
protection mechanisms help to minimize the risk of downtime and data loss incidents
that can impact productivity.
Enhanced Collaboration − File protection mechanisms can help to enhance
collaboration by allowing authorized users to access and share files securely. Access
control lists, file permissions, and encryption can help to ensure that files are only
accessed by authorized users, which helps to prevent conflicts and misunderstandings
that can arise when multiple users access the same file.
Reputation − File protection mechanisms can enhance an organizations reputation by
demonstrating a commitment to data security and compliance. By implementing
robust file protection mechanisms, organizations can build trust with their customers,
partners, and stakeholders, which can have a positive impact on their reputation and
bottom line.
Overhead − Some file protection mechanisms such as encryption, access control lists,
and auditing can add overhead to system performance. This can impact system
resources and slow down file access and processing times.
Complexity − File protection mechanisms can be complex and require specialized
knowledge to implement and manage. This can lead to errors and misconfigurations
that compromise data security.
Compatibility Issues − Some file protection mechanisms may not be compatible with
all types of files or applications, leading to compatibility issues and limitations in file
usage.
Cost − Implementing robust file protection mechanisms can be expensive, especially
for small organizations with limited budgets. This can make it difficult to achieve full
data protection.
User Frustration − Stringent file protection mechanisms such as complex passwords,
frequent authentication requirements, and restricted access can frustrate users and
impact productivity.
File System provide efficient access to the disk by allowing data to be stored, located and
retrieved in a convenient way. A file System must be able to store the file, locate the file and
retrieve the file.
Most of the Operating Systems use layering approach for every task including file systems.
Every layer of the file system is responsible for some activities.
1. I/O Control level – Device drivers act as an interface between devices and OS,
they help to transfer data between disk and main memory. It takes block number
as input and as output, it gives low-level hardware-specific instruction.
2. Basic file system – It Issues general commands to the device driver to read and
write physical blocks on disk. It manages the memory buffers and caches. A
block in the buffer can hold the contents of the disk block and the cache stores
frequently used file system metadata.
3. File organization Module – It has information about files, the location of files
and their logical and physical blocks. Physical blocks do not match with logical
numbers of logical blocks numbered from 0 to N. It also has a free space that
tracks unallocated blocks.
4. Logical file system – It manages metadata information about a file i.e includes
all details about a file except the actual contents of the file. It also maintains via
file control blocks. File control block (FCB) has information about a file –
owner, size, permissions, and location of file contents.
Directory Implemetation:
Directory implementation in the operating system can be done using Singly Linked List and
Hash table. The efficiency, reliability, and performance of a file system are greatly affected
by the selection of directory-allocation and directory-management algorithms. There are
numerous ways in which the directories can be implemented. But we need to choose an
appropriate directory implementation algorithm that enhances the performance of the
system.
The implementation of directories using a singly linked list is easy to program but is time-
consuming to execute. Here we implement a directory by using a linear list of filenames
with pointers to the data blocks.
To create a new file the entire list has to be checked such that the new directory
does not exist previously.
The new directory then can be added to the end of the list or at the beginning of
the list.
In order to delete a file, we first search the directory with the name of the file to
be deleted. After searching we can delete that file by releasing the space
allocated to it.
To reuse the directory entry we can mark that entry as unused or we can append
it to the list of free directories.
To delete a file linked list is the best choice as it takes less time.
Disadvantage
The main disadvantage of using a linked list is that when the user needs to find a file the
user has to do a linear search. In today’s world directory information is used quite
frequently and linked list implementation results in slow access to a file. So the operating
system maintains a cache to store the most recently used directory information.
An alternative data structure that can be used for directory implementation is a hash table.
It overcomes the major drawbacks of directory implementation using a linked list. In this
method, we use a hash table along with the linked list. Here the linked list stores the
directory entries, but a hash data structure is used in combination with the linked list.
In the hash table for each pair in the directory key-value pair is generated. The hash
function on the file name determines the key and this key points to the corresponding file
stored in the directory. This method efficiently decreases the directory search time as the
entire list will not be searched on every operation. Using the keys the hash table entries are
checked and when the file is found it is fetched.
Disadvantage:
The major drawback of using the hash table is that generally, it has a fixed size and its
dependency on size. But this method is usually faster than linear search through an entire
directory using a linked list.
Allocation Methods:
The allocation methods define how the files are stored in the disk blocks. There are three
main disk space or file allocation methods.
Contiguous Allocation
Linked Allocation
Indexed Allocation
The main idea behind these methods is to provide:
Efficient disk space utilization.
Fast access to the file blocks.
1. Contiguous Allocation:
In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a
file requires n blocks and is given a block b as the starting location, then the blocks
assigned to the file will be: b, b+1, b+2,……b+n-1. This means that given the starting
block address and the length of the file (in terms of blocks required), we can determine the
blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
Address of starting block
Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks.
Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
Both the Sequential and Direct Accesses are supported by this. For direct access,
the address of the kth block of the file which starts at block b can easily be
obtained as (b+k).
This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
This method suffers from both internal and external fragmentation. This makes it
inefficient in terms of memory utilization.
Increasing file size is difficult because it depends on the availability of
contiguous memory at a particular instance.
2. Linked List Allocation
In this scheme, each file is a linked list of disk blocks which need not be contiguous. The
disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block. Each block
contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The last
block (25) contains -1 indicating a null pointer and does not point to any other block.
Advantages:
This is very flexible in terms of file size. File size can be increased easily since
the system does not have to look for a contiguous chunk of memory.
This method does not suffer from external fragmentation. This makes it
relatively better in terms of memory utilization.
Disadvantages:
Because the file blocks are distributed randomly on the disk, a large number of
seeks are needed to access every block individually. This makes linked
allocation slower.
It does not support random or direct access. We can not directly access the
blocks of a file. A block k of a file can be accessed by traversing k blocks
sequentially (sequential access ) from the starting block of the file via block
pointers.
Pointers required in the linked allocation incur some extra overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all the
blocks occupied by a file. Each file has its own index block. The ith entry in the index
block contains the disk address of the ith file block. The directory entry contains the
address of the index block as shown in the image:
Advantages:
This supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
It overcomes the problem of external fragmentation.
Disadvantages:
The pointer overhead for indexed allocation is greater than linked allocation.
For very small files, say files that expand only 2-3 blocks, the indexed allocation
would keep one entire block (index block) for the pointers which is inefficient in
terms of memory utilization. However, in linked allocation we lose the space of
only 1 pointer per block.
Advantages –
Simple to understand.
Finding the first free block is efficient. It requires scanning the words
(a group of 8 bits) in a bitmap for a non-zero word. (A 0-valued word
has all bits 0). The first free block is then found by scanning for the
first 1 bit in the non-zero word.
2. Linked List – In this approach, the free disk blocks are linked together i.e. a
free block contains a pointer to the next free block. The block number of the
very first disk block is stored at a separate location on disk and is also cached in
Advantages:
Disadvantages:
UNIX pre-allocates inodes, which occupies space even before any files are created.
UNIX also distributes inodes across the disk, and tries to store data files near their
inode, to reduce the distance of disk seeks between the inodes and the data.
Some systems use variable size clusters depending on the file size.
The more data that is stored in a directory ( e.g. last access time ), the more often the
directory blocks have to be re-written.
As technology advances, addressing schemes have had to grow as well.
o Sun's ZFS file system uses 128-bit pointers, which should theoretically never
need to be expanded. ( The mass required to store 2^128 bytes with atomic
storage would be at least 272 trillion kilograms! )
Kernel table sizes used to be fixed, and could only be changed by rebuilding the
kernels. Modern tables are dynamically allocated, but that requires more
complicated algorithms for accessing them.
Performance:
Disk controllers generally include on-board caching. When a seek is requested, the
heads are moved into place, and then an entire track is read, starting from whatever
sector is currently under the heads ( reducing latency. ) The requested sector is
returned and the unrequested portion of the track is cached in the disk's electronics.
Some OSes cache disk blocks they expect to need again in a buffer cache.
A page cache connected to the virtual memory system is actually more efficient as
memory addresses do not need to be converted to disk block addresses and back
again.
Some systems ( Solaris, Linux, Windows 2000, NT, XP ) use page caching for both
process pages and file data in a unified virtual memory.
Page replacement strategies can be complicated with a unified cache, as one needs
to decide whether to replace process or file pages, and how many pages to
guarantee to each category of pages. Solaris, for example, has gone through many
variations, resulting in priority paging giving process pages priority over file I/O
pages, and setting limits so that neither can knock the other completely out of
memory.
Another issue affecting performance is the question of whether to
implement synchronous writes or asynchronous writes. Synchronous writes occur in
the order in which the disk subsystem receives them, without caching; Asynchronous
writes are cached, allowing the disk subsystem to schedule writes in a more efficient
order ( See Chapter 12. ) Metadata writes are often done synchronously. Some
systems support flags to the open call requiring that writes be synchronous, for
example for the benefit of database systems that require their writes be performed
in a required order.
The type of file access can also have an impact on optimal page replacement policies.
For example, LRU is not necessarily a good policy for sequential access files. For these
types of files progression normally goes in a forward direction only, and the most
recently used page will not be needed again until after the file has been rewound and
re-read from the beginning, ( if it is ever needed at all. ) On the other hand, we can
expect to need the next page in the file fairly soon. For this reason sequential access
files often take advantage of two special policies:
o Free-behind frees up a page as soon as the next page in the file is requested,
with the assumption that we are now done with the old page and won't need
it again for a long time.
o Read-ahead reads the requested page and several subsequent pages at the
same time, with the assumption that those pages will be needed in the near
future. This is similar to the track caching that is already performed by the
disk controller, except it saves the future latency of transferring data from the
disk controller memory into motherboard main memory.
The caching system and asynchronous writes speed up disk writes considerably,
because the disk subsystem can schedule physical writes to the disk to minimize
head movement and disk seek times.
Recovery:
System failure (e.g. sudden power outage)
may result in
1. Loss of data
2. Inconsistency of data
File system recovery techniques :
1.Consistency checker :
Compares data in directory structure with data blocks on disk, and tries to fix inconsistencies
Examples: fsck in Unix, chkdsk in Windows
2.Back up:
Use system programs to regularily back up data from disk to another storage device (e.g.
magnetic tape or other disk)
Recover lost file or disk by restoring data from backup.
When the file system is successfully modified, the transaction is removed from the log
If the file system crashes, all remaining transactions in the log must still be performed
It is that protocol which allows the users to access the data and files remotely over
the network. Any user can easily implement the NFS protocol because it is an open
standard. Any user can manipulate files as same as if they were on like other
protocols. This protocol is also built on the ONC RPC system.
Network File System is a protocol that works on all the networks of IP-based. It is
implemented in that client/server application in which the server of NFS manages the
authorization, authentication, and clients. This protocol is used with Apple Mac OS,
Unix, and Unix-like operating systems such as Solaris, Linux, FreeBSD, AIX.
Benefits of NFS
I/O Management:
I/O Hardware:
In order to manage and control the various I/O device attached to computer, I/O
system requires some hardware and software components. I/O devices commonly
use certain hardware devices. These are: system bus and ports.
o Ports are the plugs used to connect I/O devices to the computer.
o Bus is a set of wires to which these ports and I/O controllers are connected
and through which signals are send for 1/O command.
1. Polling
Polling is a software technique that uses a program to check the status of devices. The
device can be a disk drive or any other peripheral device in the computer. The program
polls the device for information, such as if it has data available or not. Polling is a slow way
to get data from a device because it has to wait until another function occurs before being
able to get information about its state.
In some cases polling may be desirable; for example, when there are several items being
polled simultaneously and only one item updates its state at any time — the rest continue
waiting until they receive an acknowledgment from one item that it’s done updating their
states (which could take seconds).
Polling can also be used to check if a device is online or not. If the device is offline, then
this information can be used to take some appropriate action such as pausing or suspending
tasks that depend on the device (e.g., stopping a backup in progress).
2. Interrupts
The CPU is interrupted by several different devices, such as I/O hardware and peripheral
devices. These interrupts are used by the I/O device to notify the CPU when it needs
attention. The CPU can be interrupted by several different devices at any given time, but
only one interrupt will be delivered to it at any given time. This can happen because of a
hardware or software error occurring on an I/O bus—for example, if a disk drive has failed
then all other data transfers will pause until it’s repaired; or perhaps another device wants
access to memory (or vice versa). In either case, there won’t be any way for your
application program not to be written specifically for this particular machine!
3. Direct Memory Access:
DMA is a way to move data between the CPU and I/O devices. When a user wants to
access memory from an I/O device (such as a disk drive), the user must first inform the
operating system that they are going to perform the operation. This is done by attaching a
special command called an interrupt request (IRQ) handler routine for each device that
needs access. The interrupt handler routine then tells the user program when it should be
ready for another task—usually using interrupts generated by hardware components like
joysticks or network cards—and may even tell other processes running on other CPUs not
only about what’s happening at that moment but also when something has happened before
so they can take appropriate action as well!
I/O Interface:
There is need of interface whenever any CPU wants to communicate with I/O devices. The
interface is used to interpret address which is generated by CPU. Thus, surface is used to
communicate to I/O devices i.e. to share information between CPU and I/O devices
interface is used which is called as I/O Interface.
Various applications of I/O Interface:
Application of I/O is that we can say interface have access to open any file without an y
kind of information about file i.e., even basic information of file is unknown. It also has
feature that it can be used to also add new devices to computer system even it does not
cause any kind of interrupt to operating system. It can also used to abstract differences in
I/O devices by identifying general kinds. The access to each of general kind is through
standardized set of function which is called as interface.
1. Character-stream or Block:
A character stream or block both transfers data in form of bytes. The difference
between both of them is that character-stream transfers bytes in linear way i.e.,
one after another whereas block transfers whole byte in single unit.
2. Sequential or Random Access:
To transfer data in fixed order determined by device, we use sequential device
whereas user to instruct device to seek to any of data storage locations, random-
access device is used.
3. Synchronous or Asynchronous:
Data transfers with predictable response times is performed by synchronous
device, in coordination with others aspects of system. An irregular or
unpredictable response times not coordinated with other computer events is
exhibits by an asynchronous device.
4. Sharable or Dedicated:
Several processes or threads can be used concurrently by sharable device;
whereas dedicated device cannot.
5. Speed of Operation:
The speed of device has range set which is of few bytes per second to few giga-
bytes per second.
6. Read-write, read only, write-only:
Different devices perform different operations, some supports both input and
output, but others supports only one data transfer direction either input or output.
One of the key services provided by the I/O subsystem is the scheduling
of I/O requests. Scheduling involves determining the best order in which
to execute I/O requests to improve system performance, share device
access permissions fairly, and reduce the average waiting time, response
time, and turnaround time for I/O operations to complete. The OS
developers implement schedules by maintaining a wait queue of requests
for each device, and the I/O scheduler rearranges the order to improve
the efficiency of the system.
Buffering
Caching
Error Handling
Error handling is another crucial function of the I/O subsystem, which
guards against many kinds of hardware and application errors. An OS that
uses protected memory can prevent a complete system failure from minor
mechanical glitches. Devices and I/O transfers can fail transiently or
permanently, but the OS can handle such failures in different ways.
I/O Protection
Finally, I/O protection ensures that user processes cannot issue illegal I/O
instructions to disrupt the normal function of a system. The I/O
subsystem implements various mechanisms to prevent such disruptions
by defining all I/O instructions as privileged instructions. Users cannot
issue I/O instructions directly, preventing illegal I/O access
Modern operating systems gain significant flexibility from multiple stages of lookup tables
in path between request and physical device stages controller. There is general mechanisms
is which is used to pass request between application and drivers. Thus, without recompiling
kernel, we can introduce new devices and drivers into computer. In fact, some operating
system have the ability to load device drivers on demand. At the time of booting, system
firstly probes hardware buses to determine what devices are present. It is then loaded to
necessary drivers, accordingly I/O request.
1. System call –
Whenever, any I/O request comes, process issues blocking read() system call to
previously opened file descriptor of file. Basically, role of system-call code is to
check parameters for correctness in kernel. If data we put in form of input is
already available in buffer cache, data is going to returned to process, and in that
case I/O request is completed.
2. Alternative approach if input is not available –
If the data is not available in buffer cache then physical I/O must be performed.
The process is removes from run queue and is placed on wait queue for device,
and I/O request is scheduled. After scheduling, I/O subsystem sends request to
device driver via subroutine call or in-kernel message but it depends upon
operating system by which mode request will send.
3. Role of Device driver –
After receiving the request, device driver have to receive data and it will receive
data by allocating kernel buffer space and after receiving data it will schedules
I/O. After all this, command will be given to device controller by writing into
device-control registers.
4. Role of Device Controller –
Now, device controller operates device hardware. Actually, data transfer is done
by device hardware.
5. Role of DMA controller –
After data transfer, driver may poll for status and data, or it may have set up
DMA transfer into kernel memory. The transfer is managed by DMA controller.
At last when transfers complete, it will generates interrupt.
6. Role of interrupt handler –
The interrupt is send to correct interrupt handler through interrupt-vector table.
It store any necessary data, signals device driver, and returns from interrupt.
7. Completion of I/O request –
When, device driver receives signal. This signal determines that I/O request has
completed and also determines request’s status, signals kernel I/O subsystem
that request has been completed. After transferring data or return codes to
address space kernel moves process from wait queue back to ready queue.
8. Completion of System call –
When process moves to ready queue it means process is unblocked. When the
process is assigned to CPU, it means process resumes execution at completion of
system call
STREAMS:
Disk Management:
Disk Structure:
Disk Attachment:
Definition
Advantages
Faster data transfer speeds − Internal disk attachment provides faster data
transfer speeds compared to external attachment methods, such as USB or
FireWire.
Better power management − Internal storage devices can be more easily
managed by the operating system's power management features, allowing for
more efficient power usage.
More secure − Since internal storage devices are physically connected to the
motherboard, they are less likely to be accidentally disconnected or removed.
Disadvantages
Limited expansion − Internal disk attachment limits the number of storage
devices that can be connected to a computer system. This can be problematic for
users who require a large amount of storage space.
The difficulty of access − Since internal storage devices are located inside the
computer system, accessing them for upgrades or repairs can be more difficult
and time-consuming.
Higher cost − Internal storage devices can be more expensive than external
devices due to their higher performance and reliability requirements.
Definition
Advantages
Portability − External storage devices can be easily transported and used on
multiple computer systems, making them ideal for users who require access to
their data on the go.
Ease of access − External storage devices are located outside the computer
system, making them easy to access for upgrades or repairs.
Expandability − External storage devices can be easily added or removed from a
computer system, allowing for more storage space as needed.
Disadvantages
Slower data transfer speeds − External disk attachments typically provide
slower data transfer speeds compared to internal attachment methods, such as
SATA.
Limited power management − External storage devices may not be as easily
managed by the operating system's power management features, leading to less
efficient power usage.
Less secure − External storage devices can be accidentally disconnected or
removed, leading to potential data loss or corruption.
Definition
Advantages
Easy to set up and manage − NAS devices are designed to be user-friendly, and
they can be easily configured and managed using a web-based interface.
Cost-effective − NAS devices are typically less expensive than other storage
architectures, such as Storage Area Networks (SANs), and they can offer high-
capacity storage for a relatively low cost.
Centralized storage − NAS devices provide a centralized storage location that
can be accessed by multiple users or devices on the network, which can be useful
for sharing files and backing up data.
Disadvantages
Limited performance − NAS devices may not offer the same level of
performance as other storage architectures, such as SANs, especially for high-
performance applications.
Limited scalability − NAS devices may be limited in terms of scalability,
especially for larger enterprise environments.
Network dependency − NAS devices rely on network connectivity, which can be
a potential point of failure or a bottleneck for storage access.
Storage Area Network
Definition
Advantage
SANs offer several advantages over other storage architectures. They can
provide high-speed, lowlatency access to storage devices, which can be
critical for high-performance applications such as databases or virtualized
environments.
Disadvantage
SANs can also be complex and expensive to implement and maintain, and
they may require specialized skills and expertise to configure and
manage. They also require a dedicated network infrastructure, which can
add to the overall cost and complexity of the storage infrastructure.
Definition
Advantages
Faster data transfer speeds − SATA provides faster data transfer speeds
compared to older parallel ATA (PATA) standards.
Higher storage capacity − SATA supports larger storage devices than PATA,
allowing for more data to be stored on a single device.
Disadvantages
Limited cable length − SATA cables are limited in length, which can be
problematic for larger computer systems.
The limited number of devices − SATA only supports a limited number of
devices per controller, which can be problematic for users who require a large
amount of storage space.
SCSI
Definition
Advantages
High data transfer speeds − SCSI provides high data transfer speeds compared
to older standards, such as PATA.
Support for multiple devices − SCSI supports a large number of devices per
controller, making it ideal for users who require a large amount of storage space.
Disadvantages
Higher cost − SCSI devices can be more expensive than other attachment
methods due to their higher performance and reliability requirements.
Limited compatibility − SCSI devices may not be compatible with all computer
systems, which can be problematic for users who require a high-performance
storage solution.
SAS
Definition
Advantages
High data transfer speeds − provides high data transfer speeds compared to
older standards, such as PATA.
Support for multiple devices − SAS supports a large number of devices per
controller, making it ideal for users who require a large amount of storage space.
Disadvantages
Higher cost − SAS devices can be more expensive than other attachment
methods due to their higher performance and reliability requirements.
Limited compatibility − SAS devices may not be compatible with all computer
systems, which can be problematic for users who require a high-performance
storage solution.
seek time : Seek time is the time it takes for the disk head to move to the
desired location on the disk.
rotational latency: rotational latency is the time taken by the disk to rotate
the desired data sector under the disk head.
First-Come-First-Serve
The First-Come-First-Served (FCFS) disk scheduling algorithm is one of the
simplest and most straightforward disk scheduling algorithms used in modern
operating systems. It operates on the principle of servicing disk access
requests in the order in which they are received. In the FCFS algorithm, the
disk head is positioned at the first request in the queue and the request is
serviced. The disk head then moves to the next request in the queue and
services that request. This process continues until all requests have been
serviced.
Example
Shortest-Seek-Time-First
Shortest Seek Time First (SSTF) is a disk scheduling algorithm used in
operating systems to efficiently manage disk I/O operations. The goal of
SSTF is to minimize the total seek time required to service all the disk access
requests. In SSTF, the disk head moves to the request with the shortest seek
time from its current position, services it, and then repeats this process until
all requests have been serviced. The algorithm prioritizes disk access
requests based on their proximity to the current position of the disk head,
ensuring that the disk head moves the shortest possible distance to service
each request.
Example
In this case, for the same order of success request, the total seek time =
(60-50) + (70-60) + (90-70) + (90-30) + (30-20) + (150-20) = 240
SCAN
SCAN (Scanning) is a disk scheduling algorithm used in operating systems to
manage disk I/O operations. The SCAN algorithm moves the disk head in a
single direction and services all requests until it reaches the end of the disk,
and then it reverses direction and services all the remaining requests. In
SCAN, the disk head starts at one end of the disk, moves toward the other
end, and services all requests that lie in its path. Once the disk head reaches
the other end, it reverses direction and services all requests that it missed on
the way. This continues until all requests have been serviced.
Example
If we consider that the head direction is left in case of SCAN, the total seek
time = (50-30) + (30-20) + (20-0) + (60-0) + (60-70) + (90-70) + (90-
150) = 200
C-SCAN
The C-SCAN (Circular SCAN) algorithm operates similarly to the SCAN
algorithm, but it does not reverse direction at the end of the disk. Instead,
the disk head wraps around to the other end of the disk and continues to
service requests. This algorithm can reduce the total distance the disk head
must travel, improving disk access time. However, this algorithm can lead to
long wait times for requests that are made near the end of the disk, as they
must wait for the disk head to wrap around to the other end of the disk
before they can be serviced. The C-SCAN algorithm is often used in modern
operating systems due to its ability to reduce disk access time and improve
overall system performance.
Example
For C-SCAN, the total seek time = (60-50) + (70-60) + (90-70) + (150-90)
+ (199-150) + (199-0) + (20-0) + (30-20) = 378
LOOK
The LOOK algorithm is similar to the SCAN algorithm but stops servicing
requests as soon as it reaches the end of the disk. This algorithm can reduce
the total distance the disk head must travel, improving disk access time.
However, this algorithm can lead to long wait times for requests that are
made near the end of the disk, as they must wait for the disk head to wrap
around to the other end of the disk before they can be serviced. The LOOK
algorithm is often used in modern operating systems due to its ability to
reduce disk access time and improve overall system performance.
Example
Considering the head direction is right, in this case, the total seek time =
(60-50) + (70-60) + (90-70) + (150-90) + (150-30) + (30-20) = 230
C-LOOK
C-LOOK is similar to the C-SCAN disk scheduling algorithm. In this algorithm,
goes only to the last request to be serviced in front of the head in spite of
the disc arm going to the end, and then from there it goes to the other end’s
last request. Thus, it also prevents the extra delay which might occur due to
unnecessary traversal to the end of the disk.
Example
For the C-LOOK algorithm, the total seek time = (60-50) + (70-60) + (90-
70) + (150-90) + (150-20) + (30-20) = 240
Disk Management:
1. Process Management
2. Memory Management
3. File and Disk Management
4. I/O System Management
Disk Management of the OS includes the various aspects, such as:
1. Disk Formatting
A new magnetic disk is mainly a blank slate. It is platters of the magnetic
recording material. Before a disk may hold data, it must be partitioned into
sectors that may be read and written by the disk controller. It is known
as physical formatting and low-level formatting.
Low-level formatting creates a unique data structure for every sector on the
drive. A data structure for a sector is made up of a header, a data region, and
a trailer. The disk controller uses the header and trailer to store information
like an error-correcting code (ECC) and a sector number.
The OS must require recording its own data structures on the disk drive to
utilize it as a storage medium for files. It accomplishes this in two phases. The
initial step is to divide the disk drive into one or more cylinder groups. The OS
may treat every partition as it were a separate disk. For example, one partition
could contain a copy of the OS executable code, while another could contain
user files. The second stage after partitioning is logical formatting. The
operating store stores the initial file system data structure on the disk drive in
this second stage.
2. Boot Block
When a system is turned on or restarted, it must execute an initial program.
The start program of the system is called the bootstrap program. It starts the
OS after initializing all components of the system. The bootstrap program
works by looking for the OS kernel on disk, loading it into memory, and
jumping to an initial address to start the OS execution.
Swap-Space Management:
Swap-space management is another low-level task of the operating system. Virtual memory
uses disk space as an extension of main memory. Since disk access is much slower than
memory access, using swap space significantly decreases system performance. The main goal
for the design and implementation of swap space is to provide the best throughput for the
virtual memory system.
RAID Structure:
RAID is a technique that makes use of a combination of multiple disks instead of
using a single disk for increased performance, data redundancy, or both. The term
was coined by David Patterson, Garth A. Gibson, and Randy Katz at the University
of California, Berkeley in 1987.
1. RAID-0 (Stripping)
Blocks are “stripped” across disks.
RAID-0
Raid-0
Evaluation
Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be
recovered.
Capacity: N*B
The entire space is being used to store data. Since there is no duplication,
N disks each having B blocks are fully utilized.
Advantages
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.
2. RAID-1 (Mirroring)
More than one copy of each block is stored in a separate disk. Thus, every
block has two (or more) copies, lying on different disks.
Raid-1
Raid-3
Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data
loss occurs, we can construct it with Disk 3.
Advantages
1. Data can be transferred in bulk.
2. Data can be accessed in parallel.
Disadvantages
1. It requires an additional drive for parity.
2. In the case of small-size files, it performs slowly.
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
Instead of duplicating data, this adopts a parity-based approach.
Raid-4
Raid-4
Assume that in the above figure, C3 is lost due to some disk failure. Then,
we can recompute the data bit stored in C3 by looking at the values of all
the other columns and the parity bit. This allows us to recover lost data.
Evaluation
Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way
parity works). If more than one disk fails, there is no way to recover the
data.
Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1)
disks are made available for data storage, each disk having B blocks.
Advantages
1. It helps in reconstructing the data if at most one data is lost.
Disadvantages
1. It can’t help in reconstructing when more than one data is lost.
6. RAID-5 (Block-Level Stripping with Distributed Parity)
This is a slight modification of the RAID-4 system where the only
difference is that the parity rotates among the drives.
Raid-5
Raid-6
Advantages
1. Very high data Accessibility.
2. Fast read data transactions.
Disadvantages
1. Due to double parity, it has slow write data transactions.
2. Extra space is required.
Stable Storage Implementation:
To achieve such storage, we need to replicate the required information on multiple
storage devices with independent failure modes. The writing of an update should be
coordinate in such a way that it would not delete all the copies of the state and when
we are recovering from a failure we can force all the copies to a consistent and
correct valued even if another failure occurs during the recovery.
1. Successful completion –
The data will be written correctly on the disk.
2. Partial Failure –
In this case, failure has occurred in the middle of the data transfer, such
that only some sectors were written with the new data, and the sectors
which were written during the failure may have been corrupted.
3. Total Failure –
The failure occurred before the disk write started, so the previous data
values on the disk remains intact.
1. Write the information onto the first physical block.
2. When the first write completes successfully, perform the same operation
onto the second physical block.
3. When both the operations are successful, declare the operation as
complete.
Tertiary-Storage Structure:
Tertiary storage units are widely employed for offsite storage or for the long-term
retention of volumes of data that are rarely accessed. Tape libraries, optical
jukeboxes, and cloud storage are a few examples of tertiary storage systems. Data is
kept on magnetic tapes, which are affordable, and long-lasting, but slower to access
than other forms of storage, in tape libraries. In general, optical jukeboxes are faster
than tape libraries but have a shorter lifespan since they store data on optical discs
like CDs or DVDs.
Features
Low cost: Because tertiary storage is intended for rarely accessed data and
does not have to be as quick or dependable, it is typically less expensive
than primary and secondary storage.
Large storage capacity: Tertiary storage devices are made to hold a lot of
data, usually between terabytes and petabytes.
Offsite storage: Tertiary storage systems are frequently used for offsite
storage, which can add security and safeguard against data loss due to
disasters or other unforeseen circumstances.
Slow access: Tertiary storage is not designed for frequent use, hence it
often accesses more slowly than main and secondary storage.
Storage for the long term: Tertiary storage is frequently used to store
data for the long term that is not in use but must be kept for regulatory or
compliance reasons, or for data archiving.
Data backup and recovery: Tertiary storage is frequently used for data
backup and recovery because it offers an affordable and dependable way
to store data that might be required in the event of data loss or corruption.
Large storage capacity: Tertiary storage offers significantly larger
storage capacity compared to primary and secondary storage, making it
ideal for storing large amounts of data that may not fit in primary or
secondary storage.
Cost-effective: Tertiary storage is typically more cost-effective than
primary and secondary storage, as it is designed for large-scale data
storage and is available in high-capacity devices.
Easy accessibility: With tertiary storage, data can be easily accessed and
retrieved as needed, even if it is not currently being used. This is because
tertiary storage operates at a slower speed than primary and secondary
storage.
Improved data backup and recovery: Tertiary storage provides a
convenient backup solution for critical data and enables easy data
recovery in case of a failure or data loss in primary or secondary storage.
Long-term data preservation: Tertiary storage is designed for long-term
data preservation, making it ideal for archiving data that is not frequently
used but must be kept for regulatory or historical purposes.
Scalability: Tertiary storage can be easily scaled up or down to meet
changing storage requirements, making it a flexible and adaptable solution
for organizations of any size.
Applications
Backup and Recovery: Tertiary storage is commonly used to store
backups of critical data to protect against data loss due to hardware failure
or other forms of data corruption.
Archiving: Tertiary storage can be used to store large amounts of
historical data that is not frequently accessed but still needs to be
preserved for regulatory, legal, or business reasons.
Digital Preservation: Tertiary storage is used to store and preserve
valuable digital content such as historical documents, audio and video
recordings, and photographs.
Big Data Analytics: Tertiary storage systems can store large amounts of
raw data that can be processed and analyzed for insights and decision-
making.
Cloud Storage: Tertiary storage is a component of cloud storage
solutions, where data is stored remotely and accessed over the internet.
Data Warehouses: Tertiary storage is used to store large amounts of
structured data for business intelligence and data analysis.
Data Lakes: Tertiary storage is used to store raw and unstructured data
for later processing and analysis.
Limitations
Data saved on tertiary storage is not always accessible because retrieving
data from tertiary storage takes longer than from primary or secondary
storage.
Tertiary storage is not designed for regular use, hence it often takes longer
to access than main and secondary storage.
Data kept on tertiary storage may be hard to access because it may be
stored offsite and require specialist equipment, which can make it harder
to recover data quickly.
Since it often necessitates the use of off-site storage facilities and
specialist technology, retrieving data from tertiary storage can be costly.
Data loss due to physical deterioration or other problems may occur in
tertiary storage devices like tape libraries because of their limited lifespan.
Because tertiary storage is not designed for active users and may not have
the same level of protection against data loss or corruption as primary and
secondary storage, it may not offer the same level of data security as those
two storage types.