Chapter 6 Organizing Files For Performance Not Complete
Chapter 6 Organizing Files For Performance Not Complete
2021‐2022
Chapter 6
Organizing Files for
Performance
File Compression
Contents
• Total: 10 characters
• Encoded message:
1010010011011011001111100
Index: 0 1 2 3 4 5 6 7 8 9 10
0|a|aa|b|ab|bb|aaa|ba|aaaa|aab|aabb
0 = Null string
Encoding:
Index: 1 2 3 4 5 6 7 8 9 10
0a|1a|0b|1b|3b|2a|3a|6a|2b|9b
Lempel-Ziv Codes
• Since each piece is the concatenation of a piece already seen
with a new character, the message can be encoded by a
previous index plus a new character.
• A tree can be built when encoding
''aaabbcbcdddeab''
Solution
Encoding Tree
Exercise #2
• Encode the file containing the following characters, drawing
the corresponding digital tree
Reclaiming Spaces in
Files
Motivation
• Let us consider a file of records (fixed length or
variable length)
• We know how to create a file, how to add records to
a file, modify the content of a record. These actions
can be performed physically by using the various
basic file operations we have seen.
• What happens if records need to be deleted?
• There is no basic operation that allows us to remove
part of a file.
Motivation
• Modification of a variable-length record (new record
is longer than original record )
1. append the extra data to the end of the file and
put a pointer from the original record space to
the extension => slower
2. rewrite the whole record at the end of the file (if
not sorted), leaving a hole at the original
location=> wasted space
• Record deletion should be taken care by the
program responsible for file organization
Reclaiming Space in Files
• Three forms of modification
1. record addition
2. record updating : deletion -> addition
3. record deletion
Record Deletion and Storage
Compaction
• Approach to record deletion
place a special mark in a special field of each deleted
record. (e.g.) asterisk in the first field : Fig (a),(b)
Binary Searching ,
KeySorting &
Indexing
Content
• Binary Searching
• Keysorting
• Introduction to Indexing
Binary Searching
• Let us consider fixed-length records that must be
searched by a key value
• If we knew the RRN of the record identified by this key
value, we could jump directly to the record (by using
Seek function)
• In practice, we do not have this information and we
must search for the record containing this key value
• If the file is not sorted by the key value we may have to
look at every possible record before we find the
desired record
• An alternative to this is to maintain the file sorted by
key value and use binary searching
Binary Search Algorithm
bool BinarySearch(Stream file, RecordType rec, KeyType key)
{
int low=0,high=getFileLength(file)/ sizeof(RecordType)-1;
int guess;
while (low <== high)
{
guess == (high + low) / 2;
readRecord(file, rec, guess);
if (rec.key()== key))
return true;
if (rec.key() > key))
high == guess - 1;
else low == guess + 1;
return false;
}
}
Binary Search Algorithm
Binary Search Algorithm
Binary Search Algorithm
Binary Search vs. Sequential Search
• Sequential Search: O(n)
• Binary Search: O(log2n)
• If file size is doubled, sequential search time is
doubled, while binary search time increases by 1
Keysorting
• Suppose a file needs to be sorted, but it is too big
to fit into main memory.
• To sort the file, we only need the keys.
• Suppose that all the keys fit into main memory
• Idea
– Bring the keys to main memory plus
corresponding RRN
– Do internal sorting of keys
– Rewrite the file in sorted order
Example
How much effort we must do?
• Read file sequentially once
• Go through each record in random order (seek)
• Write each record once (sequentially)
Why bother to write the file back?
• Use keynode array to create an index file instead.
address of
record
index:
Index
• Index is sorted (main memory)
• Records appear in file in the order they entered
• How to search for a recording with given LABEL ID?
– Binary search (in main memory) in the index:
find LABEL ID, which leads us to the
referenced field
– Seek for record in position given by the
reference field
Some Issues
• How to make a persistent index
– i.e. how to store the index into a file when it
is not in main memory
• How to guarantee that the index is an accurate
reflection of the contents of the file
– This is tricky when there are lots of additions,
deletions and updates
End of Chapter 6