CSC 202 Note
CSC 202 Note
A file consists of a number of records. Each record is made up of a number of fields and
each field consists of a number of characters.
The logical components deal with the real-world objects the data represent. These are field,
record and file. However, in today’s an information system, files most often exist as parts of
database, or organised collections of interrelated data.
Field
A field is the basic element of data. An individual field contains a single value, such as an
employee’s last name, a date, or the value of a sensor reading. It is characterised by its
length and data type (e.g., ASCII, string, decimal). Depending on the file design,
fields may be fixed length or variable length. In the latter case, the field often consists
of two or three subfields: the actual value to be stored, the name of the field, and, in
some cases, the length of the field. In other cases of variable-length fields, the length of the
field is indicated by the use of special demarcation symbols between fields.
Record
A record is a collection of related fields that can be treated as a unit by some application
program. For example, an employee record would contain such fields as name,
identification number, job designation, date of employment, and so on. Again, depending on
design, records may be of fixed length or variable length. A record will be of variable
length if some of its fields are of variable length or if the number of fields may vary. In the
latter case, each field is usually accompanied by a field name. In either case, the entire
record usually includes a length field.
File
A file is a collection of related records. The file is treated as a single entity by users and
applications and may be referenced by name. Files have names and may be created and
deleted. Access control restrictions usually apply at the file level. That is, in a shared
system, users and programs are granted or denied access to entire files. In some more
sophisticated systems, such controls are enforced at the record or even the field level.
File Naming
Files are abstraction mechanisms. They provide a way to store information and read it back
later. This must be done in a way as to shield the user from the details of how and where
the information is stored, and how the disks actually work. When a process creates a file, it
gives the file a name. When the process terminates, the file continue to exist, and can be
accessed by other processes using its name.
The exact rules for file naming vary somewhat from system to system, but all operating
systems allow strings of one to eight letters as legal file names. The file name is chosen by
the person creating it, usually to reflect its contents. There are few constraints on the format
of the file name: It can comprise the letters A-Z, numbers 0-9 and special characters $
# & + @ ! ( ) - { } ' ` _ ~ as well as space. The only symbols that cannot be used to identify
a file are * | < > \ ^ = ? / [ ] ' ; , plus control characters. The main caveat on chosen a file
name is that there are different rules for different operating systems that can present
problems when files are moved from computer to another. For example, Microsoft Windows
is case insensitive, so files like MYEBOOKS, myebooks, MyEbooks are all the same
to Microsoft Windows. However, under the UNIX operating system, all three would be
different files as, in this instance, file names are case sensitive.
Naming Convention
Usually a file would have two parts with “.” separating them. The part on the left side of
the period character is called the main name while the part on the right side is called the
extension. A good example of a file name is “course.doc.” The main name is course while
the extension is doc. File extension differentiates between different types of files. We can
have files with same names but different extensions and therefore we generally refer to a
file with its name along with its extension and that forms a complete file name.
A filename extension is a suffix to the name of a computer file applied to indicate the
encoding convention or file format of its contents. In some operating systems (for example
UNIX) it is optional, while in some others (such as DOS) it is a requirement. Some operating
systems limit the length of the extension (such as DOS and OS/2, to three characters)
while others (such as UNIX) do not. Some operating systems (for example RISC OS) do not
use file extensions.
TEXT
FILE TYPE CONTENT APPLICATION
.html Hypertext Mark-Up Internet browser such
Language, the code of as Internet Explorer,
simple Web pages. Crazy Browser,
Usually plain texts file Mozilla Firefox and
with embedded Opera.
formatting instructions.
.pdf Portable Document Adobe Acrobat
Format, a document
presentation format,
downloads as binary.
.rtf Rich Text Format, a Any word processing
document format that application
can be shared between
different word
processors.
.txt A plain and simple text Any word processing
file application
.doc Word processing files Microsoft Word
.dot created with popular (.doc), the related .dot
.abw packages. extension for Microsoft
.lwp Word Template, Abiword
(.abw), and Lotus
WordPro (.lwp)
IMAGES
FILE TYPE CONTENT APPLICATION
.gif General Interchange Lview and many
Format, though not the most others
economical, the most
common graphics format
not found on the Internet.
SOUND
FILE TYPE CONTENT APPLICATION
.mp3 Audio Files on both Windows Media
PC and Mac Player
.wav Audio Files on PC Real Player
.ra Real Audio, a
proprietary system for
delivering and playing
streaming audio on the
Web
.aiff Audio Files on Mac.
UTILITIES
FILE TYPE CONTENT APPLICATION
.ppt A presentation file (for Microsoft Powerpoint
slide shows)
.xls Spreadsheet files Microsoft Excel, Lotus
.123 123
.mdb A database file Microsoft Access
OTHERS
FILE TYPE CONTENT APPLICATION
.dll Dynamic Link This is a compiled
Library. This is a system file-one that should
compiled set of not be moved or altered
procedures and/or
drivers called by
another program.
.exe A DOS/ Windows Downloads and
program or a DOS/ launches it in its own
windows Self Extracting temporary directory
Archive
.zip Various popular WinZip, ZipIt, PKzip,
.sit compression formats for and others
.tar the PC, Macintosh,
and
UNIX respectively
1. What is a file?
2. What are the terms commonly used in discussing structure of a file?
3. How can you distinguish one file from another?
File Attributes
The particular information kept for each file varies from operating system to operating
system. No matter what operating system one might be using, files always have certain
attributes or characteristics. Different file attributes are discussed as follow.
File Name
The symbolic file name is the only information kept in human-read form. As it is
obvious, a file name helps users to differentiate between various files.
File Type
A file type is required for the systems that support different types of files. As discussed
earlier, file type is a part of the complete file name. We might have two different files; say
“cit381.doc” and “cit381.txt”. Therefore the file type is an important attribute which
helps in differentiating between files based on their types. File types indicate which
application should be used to open a particular file.
Location
This is a pointer to the device and location on that device of the file. As it is clear from the
attribute name, it specifies where the file is stored.
Size
Size attribute keeps track of the current size of a file in bytes, words or blocks. The size of a
file is measured in bytes. A floppy disk holds about
1.44 Mb; a Zip disk holds 100 Mb or 250 Mb; a CD holds about 800
Mb; a DVD holds about 4.7 Gb.
Protection
Protection attribute of a file keeps track of the access-control information that
controls who can do reading, writing, executing, and so on.
Usage Count
This value indicates the number of processes that are currently using
(have opened) a particular file.
Attribute Values
In addition, all operating systems associate other information with each file. The list of
attributes varies considerably from system to system. The table below shows some of
the possibilities, but other ones also exist. No existing system has all of these, but each is
present in some system.
nd
Source: Modern Operating Systems, 2 ed. by Andrew S.
Tanenbaum (2006).
The first four attributes relate to the file’s protection and tell who may access it and who
may not. All kinds of scheme are possible; in some systems the user must present a
password to access a file, in which case the password must be one of the attributes.
The flags are bits or short fields that control or enable some specific property. Hidden files,
for example, do not appear in listing of the files. The archive flag is a bit that keeps track of
whether the file has been backed up. The backup program clears it, and the operating
system sets it whenever a file is changed. In this way, the backup program can tell which
files need backing up. The temporary flag allows a file to be marked for automatic
deletion when the process that created it terminates.
The record length, key position, and key length fields are only present in files whose records
can be looked up using a key. They provide the information required to find the keys.
The various times keep track of when the file was created, most recently accessed and most
recently modified. These are useful for a variety of purposes. For example, a source file that
has been modified after the creation of the corresponding object file needs to be recompiled.
These fields provide the necessary information.
The current size tells how big the file is at present. Some mainframe operating systems
require the maximum size to be specified when the file is created, to let the operating
system reserve the maximum amount of storage in advance. Minicomputers and personal
computer systems are clever enough to do without this item.
SUMMARY
File is the basic unit of storage that enables a computer to distinguish one set of information
from another.
In naming a file, a file would have two parts with a period character separating them.
The part on the left side of the period character is called the main name while the part on
the right side is called the extension.
File extension shows the type of file and the application that the
Operating System will use in opening it.
Files have attributes which vary considerably from system to system. No existing operating
system has all of these, but each is present in some systems.
A file might or might not be stored in human-readable form, but it is invariably the “glue”
that binds a conglomeration of instructions, numbers, words, or images into a coherent unit
that a user can retrieve, delete, save, sometimes change, or send to an output device.
In this unit, we use the term file organisation to refer to the structure of a file (especially a
data file) defined in terms of its components and how they are mapped onto backing
store. Any given file organisation supports one or more file access methods. Organisation
is thus closely related to but conceptually distinct from access methods. Access method is
any algorithm used for the storage and retrieval of records from a data file by determining
the structural characteristics of the file on which it is used.
The relative priority of these criteria will depend on the applications that will use the file.
For example, if a file is only to be processed in batch mode, with all of the records accessed
every time, then rapid access for retrieval of a single record is of minimal concern. A file
stored on CD- ROM will never be updated, and so ease of update is not an issue. These
criteria may conflict. For example, for economy of storage, there should be minimum
redundancy in the data. On the other hand, redundancy is a primary means of increasing the
speed of access to data. An example of this is the use of indexes.
The number of alternative file organisations that have been implemented or just proposed is
unmanageably large. In this brief survey, we will outline five fundamental organisations.
Most structures used in actual systems either fall into one of these categories or can be
implemented or a combination of these organisations. The five organisations, the first four
of which are depicted in Figure 01, are:
The pile/serial
The sequential file
The indexed sequential file
The indexed file
The direct, or hashed, file
The Pile/Serial
The least-complicated form of file organisation may be termed the pile/serial. Data are
collected in the order in which they arrive. Each record consists of one burst of data. The
purpose of the pile/serial is simply to accumulate the mass of data and save it. Records may
have different fields, or similar fields in different orders. Thus, each field should be self-
describing, including a field name as well as a value. The length of each field must be
implicitly indicated by delimiters, explicitly included as a subfield, or known as default for
that field type. Because there is no structure to the pile/serial file, record access is by
exhaustive search. That is, if we wish to find a record that contains a particular field
with a particular value, it is necessary to examine each record in the pile until the desired
record is found or the entire file has been searched. If we wish to find all records that
contain a particular field or contain that field with a particular value, then the entire file must
be searched.
Pile/serial files are encountered when data are collected and stored prior to processing or
when data are not easy to organise. This type of file uses space well when the stored data
vary in size and structure; is perfectly adequate for exhaustive searches, and is easy to
update. However, beyond these limited uses, this type of file is unsuitable for most
applications.
The Sequential File
The most common form of file structure is the sequential file. In this file organisation, a
fixed format is used for records. All records are of the same length, consisting of the same
number of fixed-length fields in a particular order. Because the length and position of
each field are known, only the values of fields need to be stored; the field name and length
for each field are attributes of the file structure. One particular field, usually the first field in
each record, is referred to as the key field. The key field uniquely identifies the record; thus
key values for different records are always different. Further, the records are stored in
key sequence: alphabetical order for a text key, and numerical order for a numerical key.
Sequential files are typically used in batch applications and are generally optimum for such
applications if they involve the processing of all the records (e.g., a billing or payroll
application).The sequential file organisation is the only one that is easily stored on tape as
well as disk. For interactive applications that involve queries and/or updates of individual
records, the sequential file provides poor performance. Access requires the sequential
search of the file for a key match. If the entire file, or a large portion of the file, can
be brought into main memory at one time, more efficient search techniques are possible.
An alternative is to organize the sequential file physically as a linked list. One or more
records are stored in each physical block. Each block on disk contains a pointer to the
next block. The insertion of new records involves pointer manipulation but does not
require that the new records occupy a particular physical block position. Thus, some added
convenience is obtained at the cost of additional processing and overhead.
A popular approach to overcoming the disadvantages of the sequential file is the indexed
sequential file. The indexed sequential file maintains the key characteristic of the sequential
file: records are organised in sequence based on a key field. Two features are added:
The index provides a lookup capability to quickly reach the vicinity of a desired record. The
overflow file is similar to the log file used with a sequential file but is integrated so that a
record in the overflow file is located by following a pointer from its predecessor record.
In the simplest indexed sequential structure, a single level of indexing is used. The index in
this case is a simple sequential file. Each record in the index file consists of two fields: a
key field, which is the same as the key field in the main file, and a pointer into the main file.
To find a specific record, the index is searched to find the highest key value that is equal to
or precedes the desired key value. The search continues in the main file at the location
indicated by the pointer.
Additions to the file are handled in the following manner: Each record in the main file
contains an additional field not visible to the application, which is a pointer to the overflow
file. When a new record is to be inserted into the file, it is added to the overflow file. The
record in the main file that immediately precedes the new record in logical sequence is
updated to contain a pointer to the new record in the overflow file. If the immediately
preceding record is itself in the overflow file, then the pointer in that record is updated. As
with the sequential file, the indexed sequential file is occasionally merged with the overflow
file in batch mode.
The indexed sequential file greatly reduces the time required to access a single record,
without sacrificing the sequential nature of the file. To process the entire file sequentially,
the records of the main file are processed in sequence until a pointer to the overflow file is
found, then accessing continues in the overflow file until a null pointer is encountered, at
which time accessing of the main file is resumed where it left off.
To provide even greater efficiency in access, multiple levels of indexing can be used. Thus
the lowest level of index file is treated as a sequential file and a higher-level index file is
created for that file. Consider again a file with 1 million records. A lower-level index with
10,000 entries is constructed. A higher-level index into the lower level index of 100
entries can then be constructed. The search begins at the higher-level index (average length
= 50 accesses) to find an entry point into the lower-level index. This index is then
searched (average length = 50) to find an entry point into the main file, which is then
searched (average length = 50). Thus the average length of search has been reduced from
500,000 to 1000 to 150.
The indexed sequential file retains one limitation of the sequential file: effective processing
is limited to that which is based on a single field of the file. For example, when it is
necessary to search for a record on the basis of some other attributes than the key field, both
forms of sequential file are inadequate. In some applications, the flexibility of efficiently
searching by various attributes is desirable.
To achieve this flexibility, a structure is needed that employs multiple indexes, one for each
type of field that may be the subject of a search. In the general indexed file, the concept of
sequentiality and a single key are abandoned. Records are accessed only through their
indexes. The result is that there is now no restriction on the placement of records as long as
a pointer in at least one index refers to that record. Furthermore, variable-length
records can be employed.
Two types of indexes are used. An exhaustive index contains one entry for every record in
the main file. The index itself is organized as a sequential file for ease of searching. A
partial index contains entries to records where the field of interest exists. With variable-
length records,
some records will not contain all fields. When a new record is added to the main file, all of
the index files must be updated. Indexed files are used mostly in applications where
timeliness of information is critical and where data are rarely processed exhaustively.
Examples are airline reservation systems and inventory control systems.
The direct or hashed file exploits the capability found on disks to access directly any block
of a known address. As with sequential and indexed sequential files, a key field is required
in each record. However, there is no concept of sequential ordering here. The direct file
makes use of hashing on the key value. Direct files are often used where very rapid access
is required, where fixed length records are used, and where records are always
accessed one at a time. Examples are directories, pricing tables, schedules, and name lists.
SUMMARY
File organisation refers to the logical structuring of the records as determined by the way in
which they are accessed.
Short access time, ease of update, economy of storage, simple maintenance and reliability
are important criteria in choosing a file organisation
Major types of file organisation methods are pile, sequential file, indexed sequential file,
indexed file, direct/hashed file.
File organisation determines the applicable access methods. Access methods are principally
sequential and direct.
Each file organisation method has its peculiar advantages and disadvantages.
FILE MANAGEMENT
The file management system, FMS is the subsystem of an operating system that manages the
data storage organisation on secondary storage, and provides services to processes related to
their access. In this sense, it interfaces the application programs with the low-level media-
I/O (e.g. disk I/O) subsystem, freeing on the application programmers from having to
deal with low-level intricacies and allowing them to implement I/O using convenient
data-organisational abstractions such as files and records. On the other hand, the FMS
services often are the only ways through which applications can access the data stored in the
files, thus achieving an encapsulation of the data themselves which can be usefully
exploited for the purposes of data protection, maintenance and control.
Typically, the only way that a user or application may access files is through the file
management system. This relieves the user or programmer of the necessity of developing
special-purpose software for each application and provides the system with a consistent,
well-defined means of controlling its most important asset.
Generality with respect to storage devices. The FMS data abstractions and access methods
should remain unchanged irrespective of the devices involved in data storage.
Validity. An FMS should guarantee that at any given moment the stored data reflect the
operations performed on them, regardless of the time delays involved in actually
performing those operations. Appropriate access synchronization mechanism should be
used to enforce validity when multiple accesses from independent processes are possible.
Performance. The above functionalities should be offered achieving at the same a good
compromise in terms of data access speed and data transferring rate.
With respect to meeting user requirements, the extent of such requirements depends on the
variety of applications and the environment in which the computer system will be used. For
an interactive, general- purpose system, the under listed constitutes a minimal set of
requirements:
Each user should be able to create, delete, read, write, and modify files.
Each user may have controlled access to other users’ files.
Each user may control what types of accesses are allowed to the user’s files.
Each user should be able to restructure the user’s files in a form appropriate to the problem.
Each user should be able to move data between files.
Each user should be able to back up and recover the user’s files in case of damage.
Each user should be able to access his or her files by name rather than by numeric
identifier.
Device Drivers
At the lowest level, device drivers communicate directly with peripheral devices
or their controllers or channels. A device driver is responsible for starting I/O operations on
a device and processing the completion of an I/O request. For file operations, the typical
devices controlled are disk and tape drives. Device drivers are usually considered to be
part of the operating system.
Logical I/O
Logical I/O enables users and applications to access records. Thus, whereas the basic file
system deals with blocks of data, the logical I/O module deals with file records. Logical I/O
provides a general-purpose record I/O capability and maintains basic data about files. The
level of the file system closest to the user is often termed the access method. It provides a
standard interface between applications and the file systems and devices that hold the data.
Different access methods reflect different file structures and different ways of accessing and
processing the data. Some of the most common access methods are shown in Figure 01, and
they have been described in the previous unit.
CLASS AND HOME ASSESSMENT
1. What is a file management system?
2. Give examples of devices controlled by device drivers.
3. What are the different functions performed by file management systems?
The basic operations that a user or application may perform on a file are performed at the
record level. The user or application views the file as having some structure that organises
the records, such as a sequential structure (e.g., personnel records are stored alphabetically
by last name). Thus, to translate user commands into specific file manipulation commands,
the access method appropriate to this file structure must be employed. Whereas users and
applications are concerned with records or fields, I/O is done on a block basis. Thus, the
records or fields of a file must be organised as a sequence of blocks for output and
unblocked after input. To support block I/O of files, several functions are needed. The
secondary storage must be managed. This involves allocating files to free blocks on
secondary storage and managing free storage so as to know what blocks are available for
new files and growth in existing files. In addition, individual block I/O requests must be
scheduled. Both disk scheduling and file allocation are concerned with optimising
performance. As might be expected, these functions therefore need to be considered
together.
Furthermore, the optimisation will depend on the structure of the files and the access
patterns. Accordingly, developing an optimum file management system from the
point of view of performance is an exceedingly complicated task.
Figure 3 suggests a division between what might be considered the concerns of the file
management system as a separate system utility and the concerns of the operating system,
with the point of intersection being record processing. This division is arbitrary; various
approaches are taken in various systems
Retrieve _All
Retrieve all the records of a file. This will be required for an application that must process
all of the information in the file at one time. For example, an application that produces a
summary of the information in the file would need to retrieve all records. This
operation is often equated with the term sequential processing, because all of the records
are accessed in sequence.
Retrieve _One
This requires the retrieval of just a single record. Interactive, transaction-oriented
applications need this operation.
Retrieve _Next
This requires the retrieval of the record that is “next” in some logical sequence to the most
recently retrieved record. Some interactive applications, such as filling in forms, may
require such an operation. A program that is performing a search may also use this
operation.
Retrieve _Previous
Similar to Retrieve_Next, but in this case the record that is “previous” to the currently
accessed record is retrieved.
Insert _One
Insert a new record into the file. It may be necessary that the new record fit into a particular
position to preserve a sequencing of the file.
Delete_One
Delete an existing record. Certain linkages or other data structures may need to be updated
to preserve the sequencing of the file.
Update_One
Retrieve a record, update one or more of its fields, and rewrite the updated record back into
the file. Again, it may be necessary to preserve sequencing with this operation. If the length
of the record has changed, the update operation is generally more difficult than if the length
is preserved.
Retrieve_Few
Retrieve a number of records. For example, an application or user may wish to retrieve all
records that satisfy a certain set of criteria.
The nature of the operations that are most commonly performed on a file will influence the
way the file is organized, as discussed under file organisation, which in the next unit. It
should be noted that not all file systems exhibit the sort of structure discussed in this
subsection. On UNIX and UNIX-like systems, the basic file structure is just a stream of
bytes. For example, a C program is stored as a file but does not have physical fields,
records, and so on.
SUMMARY
File management system is the subsystem of an operating system that manages the data
storage organisation on secondary storage, and provides services to processes related to file
access.
FMS objectives include data management, protection of files against dangerous operations,
control over, etc.
Different operations are supported by FMS which include retrieve_one, retrieve_all,
and so on.
To a user, FMS serves as an interface to file creation and deletion, file ownership
and access control, logical identification
of data and technical failure prevention as a result of data redundancy.
FILE DIRECTORIES
Concept of File Directory
To keep track of files, the file system normally provides directories, which, in many systems
are themselves files. The structure of the directories and the relationship among them are the
main areas where file systems tend to differ, and it is also the area that has the most
significant effect on the user interface provided by the file system.
Table 8 on next page suggests the information typically stored in the directory for each file in
the system. From the user’s point of view, the directory provides a mapping between file
names, known to users and applications, and the files themselves. Thus, each file entry
includes the name of the file. Virtually all systems deal with different types of files and
different file organisations, and this information is also provided. An important category of
information about each file concerns its storage, including its location and size. In shared
systems, it is also important to provide information that is used to control access to the file.
Typically, one user is the owner of the file and may grant certain access privileges to other
users. Finally, usage information is needed to manage the current use of the file and to record
the history of its usage.
• Single-Level Directory
• Two-Level Directory
• Tree-Structured Directory
Single-Level Directory
In a single-level directory system, all the files are placed in one directory. This is
very common on single-user operating systems. A single-level directory has significant
limitations when the number of files increases or when there is more than one user.
Since all files are in the same directory, they must have unique names. If there are two
users who call their data file “cit381note.doc”, then the unique-name rule is violated.
Even with a single user, as the number of files increases, it becomes difficult to
remember the names of all the files in order to create only files with unique names.
In the two-level directory system, the system maintains a master block that has one
entry for each user. This master block contains the addresses of the directory of the
users. There are still problems with two-level directory structure. This structure
effectively isolates one user from another. This design eliminates name conflicts among
users and this is an advantage because users are completely independent, but a
disadvantage when the users want to cooperate on some task and access files of other
users. Some systems simply do not allow local files to be accessed by other users. It is
also unsatisfactory for users with many files because it is quite common for users to
want to group their files together in a logical way.
In the tree-structured directory, the directory themselves are considered as files. This leads to
the possibility of having sub-directories that can contain files and sub-subdirectories. An
interesting policy decision in a tree-structured directory structure is how to handle the
deletion of a directory. If a directory is empty, its entry in its containing directory can simply
be deleted. However, suppose the directory to be deleted is not empty, but contains several
files or sub-directories then it becomes a bit problematic. Some systems will not delete a
directory unless it is empty. Thus, to delete a directory, someone must first delete all the files
in that directory. If there are any subdirectories, this procedure must be applied recursively to
them so that they can be deleted too. This approach may result in a substantial amount of
work. An alternative approach is just to assume that when a request is made to delete a
directory, all of that directory’s files and sub-directories are also to be deleted. This is the
most common directory structure.
th
Source: Operating System Concepts with Java, 6 ed. by Abraham
Silberschatz and Others. (2004)
Acyclic-Graph Directories
Path Names
When a file system is organized as a directory tree, some way is needed for specifying the
filenames. The use of a tree-structured directory minimizes the difficulty in assigning
unique names. Any file in the system can be located by following a path from the root or
master directory down various branches until the file is reached. The series of directory
names, culminating in the file name itself, constitutes a pathname for the file. Two different
methods commonly used are:
Note that absolute file names always start at the root directory and are unique. In UNIX the
file components of the path are separated by /. In MS-DOS the separator is \. In
MULTICS it is >. No matter which character is used, if the first character of the path
name is the separator, then the path is absolute.
FILE-A
FILE-B
STUDENT PG TEST.TXT
EXAM.DOC
DEPT.RTF
I.D.MDB
NASU ARREARS.REC
The operating system provides systems calls to create, write, read, reposition, truncate and
delete files. The following sub-units discuss the specific duties a file system must do for
each of the following basic file operations.
File Operations
The following are various operations that can take place on file:
a. Creating a File
When creating a file, a space in the file system must be found for the file and then an entry
for the new file must be made in the directory. The directory entry records the name of the
file and the location in the file system.
b. Opening a File
Before using a file, a process must open it. The purpose of the OPEN call is to allow the
system to fetch the attributes and list of secondary storage disk addresses into main memory
for rapid access on subsequent calls.
c. Closing a File
When all the accesses are finished, the attributes and secondary storage addresses are no
longer needed, so the file should be closed to free up internal table space. Many systems
encourage this by imposing a maximum number of open files on processes.
d. Writing a File
To write a file, a system call is made specifying both the name of the file and the
information to be written to the file. Given the name of the file, the system searches the
directory to find the location of the file. The directory entry will need to store a pointer to
the current block of the file (usually the beginning of the file). Using this pointer, the
address of the next block can be computed where the information will be written. The write
pointer must be updated ensuring successive writes that can be used to write a
sequence of blocks to the file. It is also important to make sure that the file is not
overwritten in case of an append operation, i.e. when we are adding a block of data at the
end of an already existing file.
e. Reading a File
To read a file, a system call is made that specifies the name of the file and where (in
memory) the next block of the file should be put. Again, the directory is searched for the
associated directory entry, and the directory will need a pointer to the next block to be
read. Once the block is read, the pointer is updated.
Repositioning a File
When repositioning a file, the directory is searched for the appropriate entry, and the current
file position is set to a given value. This file operation is also called file seek.
a. Truncating a File
The user may erase some contents of a file but keep its attributes. Rather than forcing the
user to delete the file and then recreate it, this operation allows all the attributes to remain
unchanged, except the file size.
b. Deleting a File
To delete a file, the directory is searched for the named file. Having found the associated
directory entry, the space allocated to the file is released (so it can be reused by other files)
and invalidates the directory entry.
c. Renaming a File
It frequently happens that user needs to change the name of an existing file. This system
call makes that possible. It is not always strictly necessary, because the file can always be
copied to a new file with the new name, and the old file then deleted.
d. Appending a File
This call is a restricted form of WRITE call. It can only add data to the end of the file.
System that provide a minima set of system calls do not generally have APPEND, but many
systems provide multiple ways of doing the same thing, and these systems sometimes have
APPEND.
The ten operations described comprise only the minimal set of required file operations.
Others may include copying, and executing a file. Also of use are facilities to lock
sections of an open file for multiprogramming access, to share sections, and even to
map sections into memory or virtual-memory systems. This last function allows a part of the
virtual address to be logically associated with section of a file. Reads and writes to that
memory region are then treated as reads and writes to the file.
When considering a particular directory structure, we need to keep in mind the operations
that are to be performed on a directory.
a. Create a File
New files need to be created and added to the directory.
b. Delete a File
When a file is no longer needed, we want to remove it from the directory. Only an
empty directory can be deleted.
c. Open a File
Directories can be read. For example, to list all files in a directory, a listing program opens
the directory to read out the names of all the files it contains. Before a directory can be
read, it must be opened.
d. Close a File
When a directory has been read, it should be closed to free up internal table space.
e. Read a File
This call returns the next entry in an open directory. Formerly, it was possible to read
directories using the usual READ system call, but that approach has the disadvantage of
forcing the programmer to know and deal with the internal structure of directories. In
contrast, READDIR always returns one entry in a standard format, no matter which of the
possible directory structure is being used.
f. Rename a File
Because the name of a file represents its contents to its uses, the name must be changeable
when the contents or use of the file changes. Renaming a file may also allow its
position within the directory structure to be changed.
We need to be able to search a directory structure to find the entry for a particular file.
h. List a Directory
We need to list the files in a directory and the contents of the directory entry for each file
in the list.
Note that the above list gives the most important operations, but there are a few others
as well, for example, for managing the protection information associated with a
directory.
26
26
File Sharing
In a multiuser system, there is almost always a requirement for allowing files to be shared
among a number of users. Two issues arise: access rights and the management of
simultaneous access.
Access Right
The file system should provide a flexible tool for allowing extensive file sharing among
users. The file system should provide a number of options so that the way in which a
particular file is accessed can be controlled. Typically, users or groups of users are granted
certain access rights to a file. A wide range of access rights are in use. The following list is
representative of access rights that can be assigned to a particular user for a particular file:
None: The user may not even know of the existence of the file, not to talk of accessing it.
To enforce this restriction, the user would not be allowed to read the user directory that
contains this file.
Knowledge: The user can determine that the file exists and who its owner is. The user
is then able to petition the owner for additional access rights.
Execution: The user can load and execute a program but cannot copy it. Proprietary
programs are often made accessible with this restriction.
Reading: The user can read the file for any purpose, including copying and execution.
Some systems are able to enforce a distinction between viewing and copying. In the former
case, the contents of the file can be displayed to the user, but the user has no means for
making a copy.
Appending: The user can add data to the file, often only at the end, but cannot modify
or delete any of the file’s contents. This right is useful in collecting data from a number of
sources. Updating: The user can modify, delete, and add to the file’s data. This normally
includes writing the file initially, rewriting it completely or in part, and removing all
or a portion of the data. Some systems distinguish among different degrees of updating.
Changing protection: The user can change the access rights granted to other users.
Typically, this right is held only by the owner of the file. In some systems, the owner
can extend this right to others. To prevent abuse of this mechanism, the file owner
will typically be able to specify which rights can be changed by the holder of this
right.
Deletion: The user can delete the file from the file system.
These rights can be considered to constitute a hierarchy, with each right implying those
that precede it. Thus, if a particular user is granted the updating right for a particular file,
then that user is also granted the following rights: knowledge, execution, reading, and
appending.
One user is designated as owner of a given file, usually the person who initially created
the file. The owner has all of the access rights listed previously and may grant rights to
others. Access can be provided to different classes of users:
Specific user: Individual users who are designated by user ID. User groups: A set of
users who are not individually defined. The system must have some way of
keeping track of the membership of user groups.
All: All users who have access to this system. These are public files.
Simultaneous Access
When access is granted to append or update a file to more than one user, the operating
system or file management system must enforce discipline. A brute-force approach is to
allow a user to lock the entire file when it is to be updated. A finer grain of control is
to lock individual records during update.
27
27
HOME AND CLASS EXERCISE
SUMMARY
In this lecture , you have learnt that:
Different operations can be performed on files and directories Examples of operations
on files include creating, reading, closing, writing, opening, repositioning, renaming,
and others Creating, deleting, opening, closing and reading are some of the operations that
can be performed on directories
File system provides a number of options with regards to right to accessing a file so as to
control the way in which a particular file is accessed
More than one user can be granted access to a file but with some level of discipline.
28
28
29
29
30
30
31
31