Sequential Files
Sequential Files
Email : abderrahim.lakehal@univ-setif.dz 1
THE BLOCKS OF A FILE
D1
F1 F2 F3 F4
D2 D4
D3
•Either the file is "seen as an array -F- ": all the blocks that make it up are contiguous.
•Or the file is "seen as a list -D- ": the blocks are not necessarily contiguous, but are
linked together.
THE BLOCKS OF A FILE
1 - Global Organization of Blocks
Among the required characteristics for managing a file viewed as an array, we mention:
•The number of the first block,
•The number of the last block (or alternatively the number of blocks used)."
D1
F1 F2 F3 F4
D2 D4
D3
THE BLOCKS OF A FILE
2 - Interne Organization of Blocks
The blocks are supposed to contain the records of a file. The records:
❑ There is one or more fields of variable sizes (variable of list type) in the structure of a
record, or the number of fields varies from one record to another.
RECORD
Field
• A variable-length record will be viewed as a sequence of bytes or characters (of variable length).
• To separate the fields within the record, either a special character can be used, or the fields can be prefixed by
their size.
❑ In the case of variable-sized records, the block cannot be defined as an array of records because the
elements of an array must always be of the same size.
❑ The solution is to consider the block as a large fixed-size character array containing the different records
(stored character by character).
Note : Even if the records are of variable lengths, the block size remains fixed.
THE BLOCKS OF A FILE
2 - Interne Organization of Blocks (Variable length record with/without overlap)
❑ To minimize wasted space in blocks (in the case of variable format only), one can opt for an organization with
overlap between two or more blocks.
❑ When inserting a new record into a block that is not yet full, and the remaining empty space is not sufficient to
fully contain the record, it is split into two parts in such a way that the first part occupies all the empty space
in the block, while the rest (the second part) is inserted into a new block allocated to the file.
❑ It is then said that the record spans across 2 blocks.
❑ This approach can easily be generalized to support records that span across multiple blocks (as in the case of
large records, possibly larger than a physical block)
BLOCK 1 BLOCK 2
Overlap
TAXONOMY OF SIMPLE FILE STRUCTURES
Sequential access methods for organizing data on disk use the following notation:
T: for a file viewed as a table, L: for a file viewed as a list
O: for an ordered file, O: for an unordered file
F: for fixed-format records, V: for variable-format records
C: with overlap of records between blocks, C: without overlap
Sequential files
T L
O O O O
F V F V F V F V
C C C C C C C C
The leaves of the following tree represent the 12 sequential access methods 11
TAXONOMY OF SIMPLE FILE STRUCTURES
For example, the T Ō V C method represents the organization of a file viewed as a table (T),
unordered (Ō), with variable-sized records (V) and accepting overlaps between blocks (C):
…….
Records
Overlap
Search is sequential, insertion at the end of the file, and deletion is logical
12
TAXONOMY OF SIMPLE FILE STRUCTURES
In the case of an LOF file (file viewed as a list, ordered with fixed-size records), each block may
contain, for example, a record array (tab), an integer indicating the number of records in the array
(nb), and an integer to keep track of the next block in the list (next):
Blocks
File (HEAD)
0 1 2 3
Records
The search is sequential, insertion only causes intra-block shifts (to maintain the order of records), and deletion
can be either logical or physical. 13
TAXONOMY OF SIMPLE FILE STRUCTURES
File (HEAD)
Blocks
Records
❑ Deletion can be done by reverse shifts (physically costly deletion) or simply by using a Boolean indicator
(logical deletion, much faster).
14
TAXONOMY OF SIMPLE FILE STRUCTURES
❑ The initial loading operation consists of constructing an ordered file with n initial records, leaving
some empty space in each block. This will help minimize the shifts that might be caused by future
insertions.
❑ Over time, the file's load factor (number of insertions / number of available spaces in the file)
increases due to future insertions, and logical deletions do not free up spaces. As a result,
performance tends to degrade over time. It is therefore recommended to reorganize the file by
performing a new initial load. This is the periodic reorganization operation.
15
FILE DECLARATION
Let b = 30 // maximum capacity of the blocks (in number of records)
End
Tbloc = struct // Structure of a block Deleted
tab: array[b] of Trec // an array of records with a maximum capacity = b Key
NB: integer // number of records in the tab (≤ b) field3
End field4
Bloc i of Tbloc
0 1 2 3 26 29
NB (Integer)
Tab : Table of 30 records of type (Trec)
GLOBAL VARIABLES: F AND BUF
•The first is used to keep track of the number of blocks used (or the logical number of the last
block in the file)
•The second will serve as an insertion counter to quickly calculate the load factor, and thus
determine if file reorganization is necessary.
*/
17
SEARCH MODULE: (BINARY SEARCH)
Input system
The key (c) to search for.
Output system
The boolean Trouv, the block number (i) containing the key, and the index (j) (position within
the block).
The cost of the search operation is logarithmic because binary search performs, in the worst case,
log₂ N block reads for a file consisting of N blocks. The complexity is O(log N).
18
SEARCH MODULE:
(BINARY SEARCH)
19
INSERTION MODULE: (WIT H POSS IBLE INT RA - AND INTE R-BLOCK S HIF TS)
Input System
The record to be inserted (e) contains the key for searching the location (block i, position j)."This
suggests that the record to be inserted (e) contains a key, which is used to identify the correct location
for insertion within the file system at block i, position j.
20
INSERTION MODULE:
(W IT H P OS S IB LE INT R A - A ND IN T E R-
B LOC K S HI FTS )
21
LOGICAL DELETION:
Consists of searching for the record and setting the 'deleted' field to true
Input System:
The location to delete (record j in block i)
INITIAL LOADING
Input System :
U : a decimal value between 0 and 1
that indicates the loading rate.
REORGANIZATION