7.3 Section 3 File Organisation
7.3 Section 3 File Organisation
OWEN NYAMAROPA
HIERACHY OF DATA
BIT All data is stored in a computer’s memory or storage devices in the form of binary
digits or bits. A bit can be either ‘ON’ or ‘OFF’ representing 1 or 0.
BYTE A group of eight bits. One byte represent one character or in different contexts,
other data such as a sound, part of picture etc. the most common code used to
represent characters is ASCII (American Standard Code for Information Interchange).
FIELD Characters are grouped together to form fields. Data held about a person can be
split into many fields, e.g. ID Number, Surname, First Name, Address, DOB, etc.
RECORD All the information about one person or item is held in a record.
FILE A file is a collection of records. A stock file will contain a record for each item of
stock, a payroll file a record for each employee and so on.
DATABASE A database may consists of many different files, linked together so that information
can be retrieved from several files simultaneously.
Fixed records
When data is stored in fixed length records, the same number of bytes is allocated to each data
item (field) with no reference to how much data is stored.
Advantages
Data can be updated in situ
It’s possible to estimate file size accurately when a new system is being designed.
Disadvantages
It wastes space on storage medium.
There is truncation of data.
1
Transaction file
Is a collection of records used in batch processing to update master file.
It contains data of all transactions that have occurred in the last period. A period may be a day, a
week, or a month.
Master files
Are permanent file of data, which is a principal source of information for a job.
They are kept up-to-date by applying the transactions that occur during the operation of
business. They contain two basic types of data:
Data of a more or less permanent nature such as name, address, rate of pay etc.
Data which will change every time transactions are applied to the file, e.g. gross pay to date,
tax paid to date, etc.
File organisation
Files stored on magnetic media can be organised in a number of ways. The method chosen will
depend on several factors such as:
How the file is to be used
How many records are processed each time the file is updated.
Whether individual records need to be quickly accessed.
Serial file organisation
Records on serial files are not in any particular sequence.
Records are stored in the order in which they are received, with new records added to the end
of the file.
Serial files are used as temporary files to store transaction data.
Access method
Serial access this means record is read from a the disk into main memory one after the other i.e.
in the order they occur on the disk.
The method can be used with magnetic tape.
Adding a record – new records are simply appended to the end of the file.
Deleting a record – create a new tape, copy all the records up to the record to be deleted, leaving
that one off the new tape and then copy out all the rest of the records to the new tape.
Uses of transaction file
Used as transaction file for recording data in the order in which events takes place.
Used to update MF and to restore data in the event of disaster like a disk head crash.
Adding a record – make a new copy of file, copying over all records until a new one can be written in
its proper place and then copy over the rest of the records.
Deleting a record - create a new tape, copy all the records to a new tape/disk , leaving out the
record to be deleted.
2
Updating sequential files
A MF is updated when one or more records is altered by applying a transaction or a file of
transaction to it.
The method used to update a sequential file is called updating by copying.
It requires the transaction file to be sorted in the same order as the MF.
MF TF
Day 1 Day 1
Grandfather
Update
TF
MF
Day 2
Day 2
Father
Update
MF
Day 3
Grandfather-Father-Son method of updating.
Son
NB Three versions of MF will be created and this is called grandfather – father- son.
3
Algorithm to update a sequential master file
Adding a record
Apply hashing algorithm to the key field to generate storage address. E.g. the address of record
75481 would be calculated as follows:
75481/1000 = 75 remainder 481. (where 1000 is the number of records to be stored).
Address = 481
If the address is already full, the record can be put in the next available place or leave a tag in
the original location and place it an overflow area.
Properties of a good hashing algorithm
The algorithm should chosen so that:
It can generate any of the available addresses on the file;
It fast to calculate;
4
It minimise collisions (synonyms).
Deleting a record
Leave the deleted record in place and set a flag labelling it deleted. The record will be logically
deleted but physical present.
Updating the file
A record is read into memory, update it, and write it back to its original location.
This is called updating by overlay or updating in place or updating in situ.
Blocks
Both disks and tapes transfer data between CPU and backing store in chunks called blocks.
Blocking factor is the number of records stored in each block.
A block can be called a sector on a disk.
A user must specify number of records in each sector/block when setting an indexed sequential
file.
A blocking strategy is to put several records in one block, but to leave enough free space for
extra records to fit in each block before overflow occurs.
Blocking packing density refers to the ratio of tracks initially set aside for records to the number
of available tracks on the cylinder.
File reorganisation
If records are continually added to and deleted from an indexed sequentially file, a large proportion
of records will end up in the overflow area. This increase access time since several blocks may have
to be read to locate a record. So it becomes necessary to reorganise the file i.e. copy files to another
file allowing free space in each block for additional records and recreating the indexes at the same
time.
6
7