B-Trees: Balanced Tree Data Structures
B-Trees: Balanced Tree Data Structures
Table of Contents:
I. Introduction
II. Structure of B-Trees
III. Operations on B-Trees
IV. Examples
V. Applications
VI. Works Cited
Introduction
When working with large sets of data, it is often not possible or desirable to
maintain the entire structure in primary storage (RAM). Instead, a relatively
small portion of the data structure is maintained in primary storage, and
additional data is read from secondary storage as needed. Unfortunately, a
magnetic disk, the most common form of secondary storage, is significantly
slower than random access memory (RAM). In fact, the system often spends
more time retrieving data than actually processing data.
B-trees are balanced trees that are optimized for situations when part or all
of the tree must be maintained in secondary storage such as a magnetic disk.
Since disk accesses are expensive (time consuming) operations, a b-tree tries
to minimize the number of disk accesses. For example, a b-tree with a height
of 2 and a branching factor of 1001 can store over one billion keys but
requires at most two disk accesses to search for any node (Cormen 384).
The Structure of B-Trees
A b-tree has a minumum number of allowable children for each node known
as the minimization factor. If t is this minimization factor, every node must
have at least t - 1 keys. Under certain circumstances, the root node is
allowed to violate this property by having fewer than t - 1 keys. Every node
may have at most 2t - 1 keys or, equivalently, 2t children.
Since each node tends to have a large branching factor (a large number of
children), it is typically neccessary to traverse relatively few nodes before
locating the desired key. If access to each node requires a disk access, then a
b-tree will minimize the number of disk accesses required. The minimzation
factor is usually chosen so that the total size of each node corresponds to a
multiple of the block size of the underlying storage device. This choice
simplifies and optimizes disk access. Consequently, a b-tree is an ideal data
structure for situations where all data cannot reside in primary storage and
accesses to secondary storage are comparatively expensive (or time
consuming).
Height of B-Trees
For n greater than or equal to one, the height of an n-key b-tree T of height h
with a minimum degree t greater than or equal to 2,
For a proof of the above inequality, refer to Cormen, Leiserson, and Rivest
pages 383-384.
The worst case height is O(log n). Since the "branchiness" of a b-tree can be
large compared to many other balanced tree structures, the base of the
logarithm tends to be large; therefore, the number of nodes visited during a
search tends to be smaller than required by other tree structures. Although
this does not affect the asymptotic worst case height, b-trees tend to have
smaller heights than other trees with the same asymptotic height.
Operations on B-Trees
The algorithms for the search, create, and insert operations are shown below.
Note that these algorithms are single pass; in other words, they do not
traverse back up the tree. Since b-trees strive to minimize disk accesses and
the nodes are usually stored on disk, this single-pass approach will reduce
the number of node visits and thus the number of disk accesses. Simpler
double-pass approaches that move back up the tree to fix violations are
possible.
Since all nodes are assumed to be stored in secondary storage (disk) rather
than primary storage (memory), all references to a given node be be
preceeded by a read operation denoted by Disk-Read. Similarly, once a node
is modified and it is no longer needed, it must be written out to secondary
storage with a write operation denoted by Disk-Write. The algorithms below
assume that all nodes referenced in parameters have already had a
corresponding Disk-Read operation. New nodes are created and assigned
storage with the Allocate-Node call. The implementation details of the Disk-
Read, Disk-Write, and Allocate-Node functions are operating system and
implementation dependent.
B-Tree-Search(x, k)
i <- 1
while i <= n[x] and k > keyi[x]
do i <- i + 1
if i <= n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return NIL
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k)
B-Tree-Create(T)
x <- Allocate-Node()
leaf[x] <- TRUE
n[x] <- 0
Disk-Write(x)
root[T] <- x
B-Tree-Split-Child(x, i, y)
z <- Allocate-Node()
leaf[z] <- leaf[y]
n[z] <- t - 1
for j <- 1 to t - 1
do keyj[z] <- keyj+t[y]
if not leaf[y]
then for j <- 1 to t
do cj[z] <- cj+t[y]
n[y] <- t - 1
for j <- n[x] + 1 downto i + 1
do cj+1[x] <- cj[x]
ci+1 <- z
for j <- n[x] downto i
do keyj+1[x] <- keyj[x]
keyi[x] <- keyt[y]
n[x] <- n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)
The split operation transforms a full node with 2t - 1 keys into two nodes
with t - 1 keys each. Note that one key is moved into the parent node. The B-
Tree-Split-Child algorithm will run in time O(t) where t is constant.
B-Tree-Insert(T, k)
r <- root[T]
if n[r] = 2t - 1
then s <- Allocate-Node()
root[T] <- s
leaf[s] <- FALSE
n[s] <- 0
c1 <- r
B-Tree-Split-Child(s, 1, r)
B-Tree-Insert-Nonfull(s, k)
else B-Tree-Insert-Nonfull(r, k)
B-Tree-Insert-Nonfull(x, k)
i <- n[x]
if leaf[x]
then while i >= 1 and k < keyi[x]
do keyi+1[x] <- keyi[x]
i <- i - 1
keyi+1[x] <- k
n[x] <- n[x] + 1
Disk-Write(x)
else while i >= and k < keyi[x]
do i <- i - 1
i <- i + 1
Disk-Read(ci[x])
if n[ci[x]] = 2t - 1
then B-Tree-Split-Child(x, i, ci[x])
if k > keyi[x]
then i <- i + 1
B-Tree-Insert-Nonfull(ci[x], k)
To perform an insertion on a b-tree, the appropriate node for the key must be
located using an algorithm similiar to B-Tree-Search. Next, the key must be
inserted into the node. If the node is not full prior to the insertion, no special
action is required; however, if the node is full, the node must be split to
make room for the new key. Since splitting the node results in moving one
key to the parent node, the parent node must not be full or another split
operation is required. This process may repeat all the way up to the root and
may require splitting the root node. This approach requires two passes. The
first pass locates the node where the key should be inserted; the second pass
performs any required splits on the ancestor nodes.
Splitting the root node is handled as a special case since a new root must be
created to contain the median key of the old root. Observe that a b-tree will
grow from the top.
B-Tree-Delete
Deletion of a key from a b-tree is possible; however, special care must be
taken to ensure that the properties of a b-tree are maintained. Several cases
must be considered. If the deletion reduces the number of keys in a node
below the minimum degree of the tree, this violation must be corrected by
combining several nodes and possibly reducing the height of the tree. If the
key has children, the children must be rearranged. For a detailed discussion
of deleting from a b-tree, refer to Section 19.3, pages 395-397, of Cormen,
Leiserson, and Rivest or to another reference listed below.
Examples
Sample B-Tree
Databases