Concept of Files
Concept of Files
Concept of files
A file is a named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tapes and optical disks.
In general, a file is a sequence of bits, bytes, lines or records whose meaning is defined
by the file’s creator and user
The operating system is used to manage files of the computer system
Computers can store information on various storage media, such as magnetic disks,
magnetic tapes and optical disks
Files are mapped onto the hardware devices by the operating system.
File operations
The operating system performs basic file operations;
i. Create
ii. Delete
iii. Open
iv. Close
v. Read
vi. Write
Sequential access
• Data is accessed one record right after another is an order.
• Read command cause a pointer to be moved ahead by one.
• Write command allocate space for the record and move the pointer to the new End
of File.
• Such a method is reasonable for tape.
Direct/Random access
• This method is useful for disks.
• The file is viewed as a numbered sequence of blocks or records.
• There are no restrictions on which blocks are read/written, it can be done in any
order.
• User now says "read n" rather than "read next".
• "n" is a number relative to the beginning of file, not relative to an absolute
physical disk location.
Advantages:
• In indexed sequential access file, sequential file and random file access is
possible.
• It accesses the records very fast if the index table is properly organized.
• The records can be inserted in the middle of the file.
• It provides quick access for sequential and direct processing.
• It reduces the degree of the sequential search.
Disadvantages:
• Indexed sequential access file requires unique keys and periodic reorganization.
• Indexed sequential access file takes longer time to search the index for the data
access or retrieval.
• It requires more storage space.
• It is expensive because it requires special software.
• It is less efficient in the use of storage space as compared to other file
organizations.
Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on the disk.
For example, if a file requires n blocks and is given a block b as the starting location, then
the blocks assigned to the file will be: b, b+1, b+2……b+n-1.
This means that given the starting block address and the length of the file (in terms of
blocks required), we can determine the blocks occupied by the file.
Linked Allocation
In this scheme, each file is a linked list of disk blocks which need not be contiguous.
The disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block.
Each block contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The
last block (25) contains -1 indicating a null pointer and does not point to any other block.
• No external fragmentation.
Disadvantages of Linked Allocation:
• It does sequential access efficiently and is not for direct access
• Each block contains a pointer, wasting space
• Blocks scatter everywhere and a large number of disk seeks may be necessary
• Reliability: what if a pointer is lost or damaged?
Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all the
blocks occupied by a file. Each file has its own index block.
The ith entry in the index block contains the disk address of the ith file block.
The directory entry contains the address of the index block as shown in the image:
Bit Vector
• For example, consider a disk having 16 blocks where block numbers 2, 3, 4, 5, 8, 9, 10,
11, 12, and 13 are free, and the rest of the blocks, i.e., block numbers 0, 1, 6, 7, 14 and 15
are allocated to some files. The bit vector for this disk will look like this
• We can find the free block number from the bit vector using the following method
•
• We will now find the first free block number in the above example.
The first group of 8 bits (00111100) constitutes a non-zero word since all bits are not 0. After
finding the non-zero word, we will look for the first 1 bit. This is the third character of the non-
zero word. Hence, offset = 3.
Therefore, the first free block number = 8 * 0 + 3 = 3.
Advantages
• The advantages of the bit vector method are-
It is simple to understand.
It is an efficient method.
It occupies less memory.
Disadvantages
• The disadvantages of the bit vector method are-
For finding a free block, the operating system may need to search the entire bit vector.
To detect the first 1 in a word that is not 0 using this method, special hardware support is
needed.
Keeping the bit vector in the main memory is possible for smaller disks but not for larger
ones. For example, a 1.3 GB disk with 512-byte blocks would need a bit vector of over
332 KB to track its free blocks. Giving away 332 KB just to maintain its free block space
is not so efficient in the long run.
Linked list
• Another method of doing free space management in operating systems is a linked list. In
this method, all the free blocks existing in the disk are linked together in a linked list. The
address of the first free block is stored somewhere in the memory. Each free block
contains a pointer that contains the address to the next free block. The last free block
points to null, indicating the end of the linked list.
Advantages
• The advantages of the linked list method are-
External fragmentation is prevented by linked list allocation. As opposed to contiguous
allocation, this prevents the wasting of memory blocks.
It is also quite simple to make our file bigger. All we have to do is link a new file block to
our linked list. The file can so expand as long as memory blocks are available.
Since the directory only needs to hold the starting and ending pointers of the file, linked
list allocation places less strain on it.
Disadvantages
• The disadvantages of the linked list method are-
This method is inefficient since we need to read each block to traverse the list, which
takes more I/O time.
There is an overhead in maintaining the pointer.
• There is no provision for random or direct memory access in linked list allocation. We
must search through the full linked list to locate the correct block if we wish to access a
certain file block
Grouping
• The third method of free space management in operating systems is grouping. This
method is the modification of the linked list method. In this method, the first free block
stores the addresses of the n free blocks. The first n-1 of these blocks is free. The last
block in these n free blocks contains the addresses of the next n free blocks, and so on.
Advantages
• The advantages of the grouping method are-
The addresses of a large number of free blocks can be found quickly.
This method has the benefit of making it simple to locate the addresses of a collection of
empty disk blocks.
It's a modification of the free list approach. So, there is no need to traverse the whole list.
Disadvantages
• The advantages of the grouping method are-
The space of one block is wasted in storing addresses. Since the nth block is used to store
the addresses of next n free blocks.
We only save the address of the first free block since we are unable to maintain a list of
all n free disk addresses.
There is an overhead in maintaining the index of blocks.
Counting
• This is the fourth method of free space management in operating systems. This method is
also a modification of the linked list method. This method takes advantage of the fact that
several contiguous blocks may be allocated or freed simultaneously. In this method, a
linked list is maintained but in addition to the pointer to the next free block, a count of
free contiguous blocks that follow the first block is also maintained. Thus, each free
block in the disk will contain two things-
• A pointer to the next free block.
• The number of free contiguous blocks following it.
Advantages
• The advantages of the counting method are-
Fast allocation of a large number of consecutive free blocks.
Random access to the free block is possible.
The overall list is smaller in size.
Disadvantages
• The disadvantages of the counting method are-
Each free block requires more space for keeping the count in the disk.
For efficient insertion, deletion, and traversal operations. We need to store the entries in
B-tree.
The entire area is reduced.
File Sharing
• File sharing, also known as file-swapping is the accessing or sharing of files by one or
more users. It is performed on computer networks as a quick way to transmit data.
Generally, a file-sharing system usually has more than one administrator, where the users
may have the same or different access privileges. It also implies having an allocated
number of personal files in the common storage.
• File sharing has been used in mainframe and multi-user computer systems for many
years, and now with widespread access to the internet, a file transfer system known as
the File-Transfer Protocol or FTP is widely used.
• File sharing is the public or private sharing of computer data or space in a network with
various levels of access privilege. Allows a number of people to use the same file or file
by some combination of being able to read or view it, write to or modify it, copy it, or
print it. File sharing can also mean having an allocated amount of personal file storage in
a common file system.
• Concurrent access anomalies- multiple access of same file in case of multiple user
system
• Data Isolation- related data required by different programs of same application may
resides in diff isolated files
• Data redundancy
• Data inconsistency
• Duplication of data
Centralized Director
• It is somewhat similar to client-server architecture in the sense that it maintains a huge
central server to provide directory service.
All the peers inform this central server of their IP address and the files they are making
available for sharing.
The server queries the peers at regular intervals to make sure if the peers are still
connected or not.
So basically, this server maintains a huge database regarding which file is present at
which IP addresses.
Query Flooding
Unlike the centralized approach, this method makes use of distributed systems.
In this, the peers are supposed to be connected to an overlay network. It means if a
connection/path exists from one peer to another, it is a part of this overlay network.
In this overlay network, peers are called nodes, and the connection between peers is
called an edge between the nodes, thus resulting in a graph-like structure.
Exploiting heterogeneity
This P2P architecture makes use of both the above-discussed systems.
It resembles a distributed system like Gnutella because there is no central server for query
processing.
But unlike Gnutella, it does not treat all its peers equally. The peers with higher
bandwidth and network connectivity are at a higher priority and are called group
leaders/super nodes. The rest of the peers are assigned to these super nodes.
These super nodes are interconnected and the peers under these super nodes inform their
respective leaders about their connectivity, IP address, and the files available for sharing.