0% found this document useful (0 votes)
18 views7 pages

7.3 Section 3 File Organisation

The document outlines file organization and database concepts, detailing the hierarchy of data from bits to databases. It discusses variable and fixed length records, transaction and master files, and various file organization methods including serial, sequential, and indexed sequential. Additionally, it covers the advantages and disadvantages of each method, as well as the processes for adding, deleting, and updating records.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

7.3 Section 3 File Organisation

The document outlines file organization and database concepts, detailing the hierarchy of data from bits to databases. It discusses variable and fixed length records, transaction and master files, and various file organization methods including serial, sequential, and indexed sequential. Additionally, it covers the advantages and disadvantages of each method, as well as the processes for adding, deleting, and updating records.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SIR.

OWEN NYAMAROPA

7.3 SECION 3: FILE ORGANISATION AND DATA BASE CONCEPTS

RECORDS AND FILES

HIERACHY OF DATA
BIT All data is stored in a computer’s memory or storage devices in the form of binary
digits or bits. A bit can be either ‘ON’ or ‘OFF’ representing 1 or 0.
BYTE A group of eight bits. One byte represent one character or in different contexts,
other data such as a sound, part of picture etc. the most common code used to
represent characters is ASCII (American Standard Code for Information Interchange).
FIELD Characters are grouped together to form fields. Data held about a person can be
split into many fields, e.g. ID Number, Surname, First Name, Address, DOB, etc.
RECORD All the information about one person or item is held in a record.
FILE A file is a collection of records. A stock file will contain a record for each item of
stock, a payroll file a record for each employee and so on.
DATABASE A database may consists of many different files, linked together so that information
can be retrieved from several files simultaneously.

Variable length records


 Records in a file may not all be of the same length. They are called variable length records.
Variable length records may be used when either:
 The number of characters in a field varies between records.
 Records have a varying number of fields.
 A variable length record has to have some way of showing where each field ends, and where the
record ends, in order that it can be processed. There are two ways of doing this:
 Use a special end-of-field character at the end o each field, and an end-of-record marker at
the end of each record.
 Use a character count at the beginning of each field and an end-of-record marker.
Advantages of variable length record
 Less space is wasted on the storage medium
 No truncation of data occurs
 It enables as many fields as necessary to be held on a particular record.
 It may reduce time taken to read a file because the records are more tightly packed.
Disadvantages
 The processing required to separate out the fields is more complex.
 The record cannot be updated in situ.
 It is harder to estimate file sizes accurately when a new system is being designed.

Fixed records
 When data is stored in fixed length records, the same number of bytes is allocated to each data
item (field) with no reference to how much data is stored.
Advantages
 Data can be updated in situ
 It’s possible to estimate file size accurately when a new system is being designed.
Disadvantages
 It wastes space on storage medium.
 There is truncation of data.

1
Transaction file
 Is a collection of records used in batch processing to update master file.
 It contains data of all transactions that have occurred in the last period. A period may be a day, a
week, or a month.
Master files
 Are permanent file of data, which is a principal source of information for a job.
 They are kept up-to-date by applying the transactions that occur during the operation of
business. They contain two basic types of data:
 Data of a more or less permanent nature such as name, address, rate of pay etc.
 Data which will change every time transactions are applied to the file, e.g. gross pay to date,
tax paid to date, etc.

File organisation
 Files stored on magnetic media can be organised in a number of ways. The method chosen will
depend on several factors such as:
 How the file is to be used
 How many records are processed each time the file is updated.
 Whether individual records need to be quickly accessed.
Serial file organisation
 Records on serial files are not in any particular sequence.
 Records are stored in the order in which they are received, with new records added to the end
of the file.
 Serial files are used as temporary files to store transaction data.
Access method
 Serial access this means record is read from a the disk into main memory one after the other i.e.
in the order they occur on the disk.
 The method can be used with magnetic tape.
Adding a record – new records are simply appended to the end of the file.
Deleting a record – create a new tape, copy all the records up to the record to be deleted, leaving
that one off the new tape and then copy out all the rest of the records to the new tape.
Uses of transaction file
 Used as transaction file for recording data in the order in which events takes place.
 Used to update MF and to restore data in the event of disaster like a disk head crash.

Sequential file organisation


 Records are stored one after the other but in a sequence according to the record key.
 Methods of access is serial/sequential.
 The method can be used with magnetic tape/magnetic disk

Adding a record – make a new copy of file, copying over all records until a new one can be written in
its proper place and then copy over the rest of the records.

Deleting a record - create a new tape, copy all the records to a new tape/disk , leaving out the
record to be deleted.

Uses of sequential files


 Used as master file for high hit rate applications such as payroll.

2
Updating sequential files
 A MF is updated when one or more records is altered by applying a transaction or a file of
transaction to it.
 The method used to update a sequential file is called updating by copying.
 It requires the transaction file to be sorted in the same order as the MF.

MF TF
Day 1 Day 1

Grandfather

Update

TF
MF
Day 2
Day 2

Father

Update

MF
Day 3
Grandfather-Father-Son method of updating.
Son

The steps are as follows


 A record is read from MF into memory.
 A record is read from the TF into memory
 The record keys from each file are compared. If no updating is required to the MF record in
memory (the master key is less than the transaction key) the master record is copied from
memory to a new MF on different tape or area of disk and another MF record is read into
memory, overwriting the previous one. This step is then repeated.
 If there is a transaction for the master record currently in memory, the record is updated. It will
be retained in memory in case there are any more transactions that apply to it. Steps 2 – 4 are
then repeated.

NB Three versions of MF will be created and this is called grandfather – father- son.

3
Algorithm to update a sequential master file

Open master file for reading


Open transaction file for reading
Open new master file for writing
Repeat
Read next transaction record
While master record key < transaction record key
Write master record to new master file
Read next master record
EndWhile
Update record
Until End Of File (transaction)
While Not End Of File (master)
Read next master record
Write master record to new master file
EndWhile

Random files (hash file, direct or relative file)


 Direct access file is a collection of records, where each record is stored at a disk address,
calculated from the record’s primary key.
 Records are stored or retrieved according to either disk address or their relative position within
the file.
 Relative file addressing is that record number 1 is stored in block 1, record number 2 in block 2
and so on.
 Using relative position to store the records is a waste of space, e.g. if there are 1 000 records to
store and each record key is 5 digits, we need 99999 blocks to store the records.
 Using disk address – hashing algorithm is used to translate the key into an address.
 Synonyms are bound to occur when two record keys generate same address.
 Resolving synonyms is to place the record that caused the collision in the next available free
space. When the highest address is reached, the next record can be stored on address 0 (known
as wrap round).
 When this method is used, searching for a record, the search has to continue until the record is
found or a blank space is found.
 Another method is to use a separate overflow area and leave a tag in the original location to
indicate where to look next.

Adding a record
 Apply hashing algorithm to the key field to generate storage address. E.g. the address of record
75481 would be calculated as follows:
75481/1000 = 75 remainder 481. (where 1000 is the number of records to be stored).
Address = 481
 If the address is already full, the record can be put in the next available place or leave a tag in
the original location and place it an overflow area.
Properties of a good hashing algorithm
The algorithm should chosen so that:
 It can generate any of the available addresses on the file;
 It fast to calculate;

4
 It minimise collisions (synonyms).

Deleting a record
 Leave the deleted record in place and set a flag labelling it deleted. The record will be logically
deleted but physical present.
Updating the file
 A record is read into memory, update it, and write it back to its original location.
 This is called updating by overlay or updating in place or updating in situ.

Use of direct access files


 Used in situations where extremely fast access to individual records is required. E.g. in an airline
booking system where thousands of bookings are made everyday for each airline from terminals
all over the country.

Indexed Sequential file Organisation


 Records are held in sequence in blocks and space is left in each block when the file is created so
that additional records can be added and the correct sequence maintained.
 An index is held in the front of the file showing the highest key in each block of records.
 Records that will not fit in the home block have to be placed in an overflow area and the tag
giving the record key and its overflow address is left in the home block to show where the record
is.
 An indexed sequential file consists of 3 areas:
1. A home area where the records are initially stored.
2. An index area containing an entry giving for each block address, the highest key in the block.
3. An overflow area to hold records that have been subsequently added and will not fit into
the correct home block.
 For a file held on disk pack, more than one level of index is required. The indexing technique
required is cylinder-surface-sector indexing.
 For each disk pack in the file, there is a cylinder index or primary index which is read into
memory and held there while the file is in use.
 It contains a list of highest key in each cylinder of the file.
 When looking for a record, with a particular key, this index is searched from the beginning until
an entry is found which is greater than or equal to the key required. The process is called
seeking.
 A cylinder on a disk is made of all the tracks that can be accessed from one position.
 Having reached the correct cylinder, a further index is read and searched. This is the surface
index or secondary index which holds list of surface numbers and the highest key to be found
there.
 By comparing these hi-keys with the one required, the correct surface can be selected. The
process is known as switching.
 Once on the right track, a third level of index, the sector index can be read and searched to give
the sector number that which the record is to be found.
Example Looking for a record key 5584
Cylinder Hi-key
0 193  Searching the index, 1st # which is greater
1 346 or equal to 5584 is 6608. So the record
. … exists is on cylinder 21. Thus, the
19 4382
read/write heads are moved to cylinder 21.
20 5495
21 6608 On arrival, the surface index is located on
surface 0 and is read
5
. …
199 49999
Surface Hi-Key
0 5510  This means that the record 5584 should be on
1 5622 surface 1, so the read head of that surface is
2 5843 activated. The sector index located on sector 0
. …… of cylinder 21, surface 1 is then read.
. ……
7 6608

 The record with key 5584 should be in sector 5, so


Sector Hi-key that sector 5 is read into memory. It will then be
0 5521 serially searched until the correct record is located.
1 5538 If it is not found then:
2 5560
 Either the record does not exist.
3 5568
4 5583  Or when the record was added to the file, there
5 5597 was no room for it in cylinder 21, surface 1,
6 5606 sector 5 and so the record was overflowed
7 5622 elsewhere.

Advantages of the file organisation


 Faster than serial search of a sequential file.
 It can be processed either randomly using the indexes or sequentially without using the indexes
Disadvantages
 The disk accessing and searching is time consuming
 The indexes take up quite a lot of space.

Blocks
 Both disks and tapes transfer data between CPU and backing store in chunks called blocks.
 Blocking factor is the number of records stored in each block.
 A block can be called a sector on a disk.
 A user must specify number of records in each sector/block when setting an indexed sequential
file.
 A blocking strategy is to put several records in one block, but to leave enough free space for
extra records to fit in each block before overflow occurs.
 Blocking packing density refers to the ratio of tracks initially set aside for records to the number
of available tracks on the cylinder.

File reorganisation
If records are continually added to and deleted from an indexed sequentially file, a large proportion
of records will end up in the overflow area. This increase access time since several blocks may have
to be read to locate a record. So it becomes necessary to reorganise the file i.e. copy files to another
file allowing free space in each block for additional records and recreating the indexes at the same
time.

Uses of indexed sequential files


 Suitable for real-time stock control systems. E.g. when a purchase is made the stock can be
immediately updated and the record written back to the file. (Updated in situ). The file can be
processed sequentially not using index at all when reports of sales or stock are needed.

6
7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy