0% found this document useful (0 votes)
24 views

Lec7 - B-Trees

B-Trees are m-way search trees used for disk-based data structures. They minimize disk accesses for operations like insertion and search. Nodes can have between m/2 and m children, allowing for efficient splitting and merging as keys are added or removed. Common operations take O(h) time where h is the tree height, which is typically low for large m.

Uploaded by

Nour Hesham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lec7 - B-Trees

B-Trees are m-way search trees used for disk-based data structures. They minimize disk accesses for operations like insertion and search. Nodes can have between m/2 and m children, allowing for efficient splitting and merging as keys are added or removed. Common operations take O(h) time where h is the tree height, which is typically low for large m.

Uploaded by

Nour Hesham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

B-Trees

1
B-Trees
Considerations for disk-based storage
systems.
Indexed Sequential Access Method
(ISAM)
m-way search trees
B-trees
2
Data Layout on Disk
• Track: one ring
• Sector: one pie-shaped piece.
• Block: intersection of a track and a sector.

3
Considerations for Disk Based
Dictionary Structures
Use a disk-based method when the dictionary is too big to
fit in RAM at once.

Minimize the expected or worst-case number of disk


accesses for the essential operations (put, get, remove).

Keep space requirements reasonable -- O(n).

Methods based on binary trees, such as red-black search


trees, are not optimal for disk-based representations. The
number of disk accesses can be greatly reduced by using
m-way search trees.
4
Indexed Sequential Access
Method (ISAM)

Store m records in each disk block.

Use an index that consists of an array with


one element for each disk block, holding a
copy of the largest key that occurs in that
block.

5
ISAM (Continued)

1.7 5.1 21.2 26.8 ...

6
ISAM (Continued)
To perform a get(k) operation:

Look in the index using, say, either a


sequential search or a binary search, to
determine which disk block should hold the
desired record.

Then perform one disk access to read that


block, and extract the desired record, if it
exists.

7
ISAM Limitations
Problems with ISAM:

What if the index itself is too large to fit entirely in


RAM at the same time?

Insertion and deletion could be very expensive if


all records after the inserted or deleted one have
to shift up or down, crossing block boundaries.

8
A Solution: B-Trees
Idea 1: Use m-way search trees.
(ISAM uses a root and one level under the root.)
m-way search trees can be as high as we need.

Idea 2: Don’t require that each node always be full.


Empty space will permit insertion without rebalancing.
Allowing empty space after a deletion can also avoid
rebalancing.

Idea 3: Rebalancing will sometimes be necessary: figure


out how to do it in time proportional to the height of the
tree.

9
B-Tree Example with m = 5

12

2 3 8 13 27

The root has been 2 and m children.


Each non-root internal node has between m/2 and m children.
All external nodes are at the same level. (External nodes are
actually represented by null pointers in implementations.)
10
Insert 10

12

2 3 8 10 13 27

We find the location for 10 by following a path from the root using the
stored key values to guide the search.
The search falls out the tree at the 4th child of the 1st child of the root.
The 1st child of the root has room for the new element, so we store it
there.
11
Insert 11

12

2 3 8 10 11 13 27

We fall out of the tree at the child to the right of key 10.
But there is no more room in the left child of the root to hold 11.
Therefore, we must split this node...

12
Insert 11 (Continued)

8 12

2 3 10 11 13 27

The m + 1 children are divided evenly between the old and new nodes.
The parent gets one new child. (If the parent become overfull, then it,
too, will have to be split).

13
Remove 8

8 12

2 3 10 11 13 27

Removing 8 might force us to move another key up from one of the


children. It could either be the 3 from the 1st child or the 10 from the
second child.
However, neither child has more than the minimum number of children
(3), so the two nodes will have to be merged. Nothing moves up.
14
Remove 8 (Continued)

12

2 3 10 11 13 27

The root contains one fewer key, and has one fewer child.

15
Remove 13

12

2 3 10 11 13 27

Removing 13 would cause the node containing it to become underfull.


To fix this, we try to reassign one key from a sibling that has spares.

16
Remove 13 (Cont)

11

2 3 10 12 27

The 13 is replaced by the parent’s key 12.


The parent’s key 12 is replaced by the spare key 11 from the left sibling.
The sibling has one fewer element.

17
Remove 11

11

2 3 10 12 27

11 is in a non-leaf, so replace it by the value immediately preceding: 10.


10 is at leaf, and this node has spares, so just delete it there.

18
Remove 11 (Cont)

10

2 3 12 27

19
Remove 2

10

2 3 12 27

Although 2 is at leaf level, removing it leads to an underfull node.


The node has no left sibling. It does have a right sibling, but that node
is at its minimum occupancy already.
Therefore, the node must be merged with its right sibling.

20
Remove 2 (Cont)

3 10 12 27

The result is illegal, because the root does not have at least 2 children.
Therefore, we must remove the root, making its child the new root.

21
Remove 2 (Cont)

3 10 12 27

The new B-tree has only one node, the root.

22
Insert 49

3 10 12 27

Let’s put an element into this B-tree.

23
Insert 49 (Cont)

3 10 12 27 49

Adding this key make the node overfull, so it must be split into two.
But this node was the root.
So we must construct a new root, and make these its children.

24
Insert 49 (Cont)

12

3 10 27 49

The middle key (12) is moved up into the root.


The result is a B-tree with one more level.

25
B-Tree performance

Let h = height of the B-tree.


get(k): at most h disk accesses. O(h)
put(k): at most 3h + 1 disk accesses. O(h)
remove(k): at most 3h disk accesses. O(h)

h < log d (n + 1)/2 + 1 where d = m/2 (Sahni, p.641).


An important point is that the constant factors are relatively low.
m should be chosen so as to match the maximum node size to
the block size on the disk.
Example: m = 128, d = 64, n  643 = 262144 , h = 4.

26
2-3 Trees
A B-tree of order m is a kind of m-way search
tree.
A B-Tree of order 3 is called a 2-3 Tree.
In a 2-3 tree, each internal node has either 2
or 3 children.
In practical applications, however, B-Trees of
large order (e.g., m = 128) are more common
than low-order B-Trees such as 2-3 trees.
27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy