Unit-5 Storage and Indexing
Unit-5 Storage and Indexing
Primary Memory
Secondary Memory
Tertiary Memory
PRIMARY MEMORY
Volume of Store
on-line storage
Non-volatile
Moderately fast
tertiary storage:
off-line storage
Non-volatile
Slow
CONTD…
In the image, the higher levels are expensive but
fast. On moving down, the cost per bit is
decreasing, and the access time is increasing. Also,
the storage media from the main memory to up
represents the volatile nature, and below the main
memory, all are non-volatile devices.
DATA ON EXTERNAL STORAGE
Disks: Can retrieve random page at fixed cost
But reading several consecutive pages is much
cheaper than reading them in random order
Tapes: Can only read pages in sequence
Cheaper than disks; used for archival storage
File organization: Method of arranging a file of
records on external storage.
Record id (rid) is sufficient to physically locate
record
Indexes are data structures that allow us to find
the record ids of records with given values in
index search key fields
The DBMS Components that read and write data from
main memory are:
Pros –
Tree traversal is easier and faster.
Searching becomes easy as all records are stored
only in leaf nodes and are sorted sequential linked
list.
There is no restriction on B+ tree size. It may
grows/shrink as the size of data increases/decreases.
Cons –
Inefficient for static tables.
CLUSTERED FILE ORGANIZATION
Dense Index
Sparse Index
DENSE INDEX
16
OPERATIONS TO COMPARE
17
COST MODEL FOR OUR ANALYSIS
ISAM P0 K 1 P
1
K 2
P 2 K m Pm
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
AFTER INSERTING 23*, 48*, 41*,
42* ...
Root
Index 40
Pages
20 33 51 63
Primary
Leaf 46* 55*
10* 15* 20* 27* 33* 37* 40* 51* 63* 97*
Pages
w
Pages 42*
... THEN DELETING 42*, 51*,
97*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
Index Entries
(Direct search)
Data Entries
("Sequence
set") 9
EXAMPLE B+ TREE
13 24 30
17
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
occupancy is
2* 3* 5* 7*
guaranteed in
8*
91
EXAMPLE B+ TREE AFTER
INSERTING 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
92
DELETING A DATA ENTRY FROM A B+
TREE
93
EXAMPLE TREE AFTER (INSERTING
8*, THEN) DELETING 19* AND
20* ...
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
94
... AND THEN DELETING
24*
❖ Must merge.
30
❖ Observe `toss’ of
index entry (on right), 22* 27* 29* 33* 34* 38* 39*
and `pull down’ of
index entry (below).
Root
5 17 30
13
95
EXAMPLE 2:SEARCHING A RECORD IN B+ TREE
Suppose we have to search 55 in the below B+ tree structure. First,
we will fetch for the intermediary node which will direct to the leaf
node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75
nodes. Then at the end, we will be redirected to the third leaf node.
Here DBMS will perform a sequential search to find 55.
B+ TREE INSERTION
Suppose we want to insert a record 60 in the below
structure. It will go to the 3rd leaf node after 55. It is a
balanced tree, and a leaf node of this tree is already full, so
we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be
inserted into tree without affecting the fill factor, balance
and order.
CONTD…
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its
current root node is 50. We will split the leaf node of the tree
in the middle so that its balance is not altered. So we can
group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node
cannot branch from 50. It should have 60 added to it, and
then we can have pointers to a new leaf node.
B+ TREE DELETION
Suppose we want to delete 60 from the above example. In this
case, we have to remove 60 from the intermediate node as
well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the
B+ tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging
the nodes, it will show as follows: