0% found this document useful (0 votes)

104 views

B-Trees: Balanced Tree Data Structures

B-trees are balanced tree data structures optimized for secondary storage like disks. They minimize disk accesses during operations by having nodes with many children and keys. Each node has a minimum number of children and keys to remain balanced. Common operations like search, insert, and delete on B-trees take logarithmic time due to their balanced structure and optimization for secondary storage access costs.

Uploaded by

arun10y07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

B-Trees: Balanced Tree Data Structures

Uploaded by

arun10y07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 10

B-Trees: Balanced Tree Data Structures

Table of Contents:

I. Introduction
II. Structure of B-Trees
III. Operations on B-Trees
IV. Examples
V. Applications
VI. Works Cited

VII. Useful Links

Introduction

Tree structures support various basic dynamic set operations including

Search, Predecessor, Successor, Minimum, Maximum, Insert, and Delete in
time proportional to the height of the tree. Ideally, a tree will be balanced
and the height will be log n where n is the number of nodes in the tree. To
ensure that the height of the tree is as small as possible and therefore provide
the best running time, a balanced tree structure like a red-black tree, AVL
tree, or b-tree must be used.

When working with large sets of data, it is often not possible or desirable to
maintain the entire structure in primary storage (RAM). Instead, a relatively
small portion of the data structure is maintained in primary storage, and
additional data is read from secondary storage as needed. Unfortunately, a
magnetic disk, the most common form of secondary storage, is significantly
slower than random access memory (RAM). In fact, the system often spends
more time retrieving data than actually processing data.

B-trees are balanced trees that are optimized for situations when part or all
of the tree must be maintained in secondary storage such as a magnetic disk.
Since disk accesses are expensive (time consuming) operations, a b-tree tries
to minimize the number of disk accesses. For example, a b-tree with a height
of 2 and a branching factor of 1001 can store over one billion keys but
requires at most two disk accesses to search for any node (Cormen 384).
The Structure of B-Trees

Unlike a binary-tree, each node of a b-tree may have a variable number of

keys and children. The keys are stored in non-decreasing order. Each key
has an associated child that is the root of a subtree containing all nodes with
keys less than or equal to the key but greater than the preceeding key. A
node also has an additional rightmost child that is the root for a subtree
containing all keys greater than any keys in the node.

A b-tree has a minumum number of allowable children for each node known
as the minimization factor. If t is this minimization factor, every node must
have at least t - 1 keys. Under certain circumstances, the root node is
allowed to violate this property by having fewer than t - 1 keys. Every node
may have at most 2t - 1 keys or, equivalently, 2t children.

Since each node tends to have a large branching factor (a large number of
children), it is typically neccessary to traverse relatively few nodes before
locating the desired key. If access to each node requires a disk access, then a
b-tree will minimize the number of disk accesses required. The minimzation
factor is usually chosen so that the total size of each node corresponds to a
multiple of the block size of the underlying storage device. This choice
simplifies and optimizes disk access. Consequently, a b-tree is an ideal data
structure for situations where all data cannot reside in primary storage and
accesses to secondary storage are comparatively expensive (or time
consuming).

Height of B-Trees

For n greater than or equal to one, the height of an n-key b-tree T of height h
with a minimum degree t greater than or equal to 2,

For a proof of the above inequality, refer to Cormen, Leiserson, and Rivest
pages 383-384.
The worst case height is O(log n). Since the "branchiness" of a b-tree can be
large compared to many other balanced tree structures, the base of the
logarithm tends to be large; therefore, the number of nodes visited during a
search tends to be smaller than required by other tree structures. Although
this does not affect the asymptotic worst case height, b-trees tend to have
smaller heights than other trees with the same asymptotic height.

Operations on B-Trees

The algorithms for the search, create, and insert operations are shown below.
Note that these algorithms are single pass; in other words, they do not
traverse back up the tree. Since b-trees strive to minimize disk accesses and
the nodes are usually stored on disk, this single-pass approach will reduce
the number of node visits and thus the number of disk accesses. Simpler
double-pass approaches that move back up the tree to fix violations are
possible.

Since all nodes are assumed to be stored in secondary storage (disk) rather
than primary storage (memory), all references to a given node be be
preceeded by a read operation denoted by Disk-Read. Similarly, once a node
is modified and it is no longer needed, it must be written out to secondary
storage with a write operation denoted by Disk-Write. The algorithms below
assume that all nodes referenced in parameters have already had a
corresponding Disk-Read operation. New nodes are created and assigned
storage with the Allocate-Node call. The implementation details of the Disk-
Read, Disk-Write, and Allocate-Node functions are operating system and
implementation dependent.

B-Tree-Search(x, k)

i <- 1
while i <= n[x] and k > keyi[x]
do i <- i + 1
if i <= n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return NIL
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k)

The search operation on a b-tree is analogous to a search on a binary tree.

Instead of choosing between a left and a right child as in a binary tree, a b-
tree search must make an n-way choice. The correct child is chosen by
performing a linear search of the values in the node. After finding the value
greater than or equal to the desired value, the child pointer to the immediate
left of that value is followed. If all values are less than the desired value, the
rightmost child pointer is followed. Of course, the search can be terminated
as soon as the desired node is found. Since the running time of the search
operation depends upon the height of the tree, B-Tree-Search is O(logt n).

B-Tree-Create(T)

x <- Allocate-Node()
leaf[x] <- TRUE
n[x] <- 0
Disk-Write(x)
root[T] <- x

The B-Tree-Create operation creates an empty b-tree by allocating a new

root node that has no keys and is a leaf node. Only the root node is permitted
to have these properties; all other nodes must meet the criteria outlined
previously. The B-Tree-Create operation runs in time O(1).

B-Tree-Split-Child(x, i, y)

z <- Allocate-Node()
leaf[z] <- leaf[y]
n[z] <- t - 1
for j <- 1 to t - 1
do keyj[z] <- keyj+t[y]
if not leaf[y]
then for j <- 1 to t
do cj[z] <- cj+t[y]
n[y] <- t - 1
for j <- n[x] + 1 downto i + 1
do cj+1[x] <- cj[x]
ci+1 <- z
for j <- n[x] downto i
do keyj+1[x] <- keyj[x]
keyi[x] <- keyt[y]
n[x] <- n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)

If is node becomes "too full," it is necessary to perform a split operation.

The split operation moves the median key of node x into its parent y where x
is the ith child of y. A new node, z, is allocated, and all keys in x right of the
median key are moved to z. The keys left of the median key remain in the
original node x. The new node, z, becomes the child immediately to the right
of the median key that was moved to the parent y, and the original node, x,
becomes the child immediately to the left of the median key that was moved
into the parent y.

The split operation transforms a full node with 2t - 1 keys into two nodes
with t - 1 keys each. Note that one key is moved into the parent node. The B-
Tree-Split-Child algorithm will run in time O(t) where t is constant.

B-Tree-Insert(T, k)

r <- root[T]
if n[r] = 2t - 1
then s <- Allocate-Node()
root[T] <- s
leaf[s] <- FALSE
n[s] <- 0
c1 <- r
B-Tree-Split-Child(s, 1, r)
B-Tree-Insert-Nonfull(s, k)
else B-Tree-Insert-Nonfull(r, k)

B-Tree-Insert-Nonfull(x, k)

i <- n[x]
if leaf[x]
then while i >= 1 and k < keyi[x]
do keyi+1[x] <- keyi[x]
i <- i - 1
keyi+1[x] <- k
n[x] <- n[x] + 1
Disk-Write(x)
else while i >= and k < keyi[x]
do i <- i - 1
i <- i + 1
Disk-Read(ci[x])
if n[ci[x]] = 2t - 1
then B-Tree-Split-Child(x, i, ci[x])
if k > keyi[x]
then i <- i + 1
B-Tree-Insert-Nonfull(ci[x], k)

To perform an insertion on a b-tree, the appropriate node for the key must be
located using an algorithm similiar to B-Tree-Search. Next, the key must be
inserted into the node. If the node is not full prior to the insertion, no special
action is required; however, if the node is full, the node must be split to
make room for the new key. Since splitting the node results in moving one
key to the parent node, the parent node must not be full or another split
operation is required. This process may repeat all the way up to the root and
may require splitting the root node. This approach requires two passes. The
first pass locates the node where the key should be inserted; the second pass
performs any required splits on the ancestor nodes.

Since each access to a node may correspond to a costly disk access, it is

desirable to avoid the second pass by ensuring that the parent node is never
full. To accomplish this, the presented algorithm splits any full nodes
encountered while descending the tree. Although this approach may result in
unecessary split operations, it guarantees that the parent never needs to be
split and eliminates the need for a second pass up the tree. Since a split runs
in linear time, it has little effect on the O(t logt n) running time of B-Tree-
Insert.

Splitting the root node is handled as a special case since a new root must be
created to contain the median key of the old root. Observe that a b-tree will
grow from the top.

B-Tree-Delete
Deletion of a key from a b-tree is possible; however, special care must be
taken to ensure that the properties of a b-tree are maintained. Several cases
must be considered. If the deletion reduces the number of keys in a node
below the minimum degree of the tree, this violation must be corrected by
combining several nodes and possibly reducing the height of the tree. If the
key has children, the children must be rearranged. For a detailed discussion
of deleting from a b-tree, refer to Section 19.3, pages 395-397, of Cormen,
Leiserson, and Rivest or to another reference listed below.

Examples

Sample B-Tree

Searching a B-Tree for Key 21

Inserting Key 33 into a B-Tree (w/ Split)

Applications

Databases

A database is a collection of data organized in a fashion that facilitates

updating, retrieving, and managing the data. The data can consist of
anything, including, but not limited to names, addresses, pictures, and
numbers. Databases are commonplace and are used everyday. For example,
an airline reservation system might maintain a database of available flights,
customers, and tickets issued. A teacher might maintain a database of
student names and grades.

Because computers excel at quickly and accurately manipulating, storing,

and retrieving data, databases are often maintained electronically using a
database management system. Database management systems are essential
components of many everyday business operations. Database products like
Microsoft SQL Server, Sybase Adaptive Server, IBM DB2, and Oracle serve
as a foundation for accounting systems, inventory systems, medical
recordkeeping sytems, airline reservation systems, and countless other
important aspects of modern businesses.

It is not uncommon for a database to contain millions of records requiring

many gigabytes of storage. For examples, TELSTRA, an Australian
telecommunications company, maintains a customer billing database with 51
billion rows (yes, billion) and 4.2 terabytes of data. In order for a database to
be useful and usable, it must support the desired operations, such as retrieval
and storage, quickly. Because databases cannot typically be maintained
entirely in memory, b-trees are often used to index the data and to provide
fast access. For example, searching an unindexed and unsorted database
containing n key values will have a worst case running time of O(n); if the
same data is indexed with a b-tree, the same search operation will run in
O(log n). To perform a search for a single key on a set of one million keys
(1,000,000), a linear search will require at most 1,000,000 comparisons. If
the same data is indexed with a b-tree of minimum degree 10, 114
comparisons will be required in the worst case. Clearly, indexing large
amounts of data can significantly improve search performance. Although
other balanced tree structures can be used, a b-tree also optimizes costly disk
accesses that are of concern when dealing with large data sets.

Concurrent Access to B-Trees

Databases typically run in multiuser environments where many users can

concurrently perform operations on the database. Unfortunately, this
common scenario introduces complications. For example, imagine a
database storing bank account balances. Now assume that someone attempts
to withdraw $40 from an account containing $60. First, the current balance
is checked to ensure sufficent funds. After funds are disbursed, the balance
of the account is reduced. This approach works flawlessly until concurrent
transactions are considered. Suppose that another person simultaneously
attempts to withdraw $30 from the same account. At the same time the
account balance is checked by the first person, the account balance is also
retrieved for the second person. Since neither person is requesting more
funds than are currently available, both requests are satisfied for a total of
$70. After the first person's transaction, $20 should remain ($60 - $40), so
the new balance is recorded as $20. Next, the account balance after the
second person's transaction, $30 ($60 - $30), is recorded overwriting the $20
balance. Unfortunately, $70 have been disbursed, but the account balance
has only been decreased by $30. Clearly, this behavior is undesirable, and
special precautions must be taken.

A b-tree suffers from similar problems in a multiuser environment. If two or

more processes are manipulating the same tree, it is possible for the tree to
become corrupt and result in data loss or errors.

The simplest solution is to serialize access to the data structure. In other

words, if another process is using the tree, all other processes must wait.
Although this is feasible in many cases, it can place an unecessary and costly
limit on performance because many operations actually can be performed
concurrently without risk. Locking, introduced by Gray and refined by many
others, provides a mechanism for controlling concurrent operations on data
structures in order to prevent undesirable side effects and to ensure
consistency. For a detailed discussion of this and other concurrency control
mechanisms, please refer to the references below

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Lab04 Priority Queues and Heaps
No ratings yet
Lab04 Priority Queues and Heaps
6 pages
CS2040 Note
No ratings yet
CS2040 Note
2 pages
Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
B Trees
No ratings yet
B Trees
6 pages
Comp 272 Notes
0% (1)
Comp 272 Notes
26 pages
This Study Resource Was: Icnchfan Kyip
No ratings yet
This Study Resource Was: Icnchfan Kyip
8 pages
Pre Order Traversal
No ratings yet
Pre Order Traversal
17 pages
B Tree Application
100% (2)
B Tree Application
6 pages
B Plus Tree.pptx
No ratings yet
B Plus Tree.pptx
36 pages
B Tree
No ratings yet
B Tree
63 pages
B Tree: Max Keys m-1 Min Keys (m/2) - 1 Max Child M Min Children m/2
No ratings yet
B Tree: Max Keys m-1 Min Keys (m/2) - 1 Max Child M Min Children m/2
8 pages
IT Data Structures Chapter 3
100% (1)
IT Data Structures Chapter 3
53 pages
Leftist Skew Heaps
100% (1)
Leftist Skew Heaps
52 pages
Trees in C++
100% (1)
Trees in C++
68 pages
DSA-Module 1_ Notes on Search Trees and Their Operations
No ratings yet
DSA-Module 1_ Notes on Search Trees and Their Operations
29 pages
C++ Program To Implement Binary Search Tree 6
No ratings yet
C++ Program To Implement Binary Search Tree 6
33 pages
06 1 Priority Queues 2 Heaps
No ratings yet
06 1 Priority Queues 2 Heaps
152 pages
Heap Sort PDF
No ratings yet
Heap Sort PDF
48 pages
Dsa Basic Data Structure
No ratings yet
Dsa Basic Data Structure
72 pages
Class Notes
No ratings yet
Class Notes
18 pages
Amortized Analysis
100% (1)
Amortized Analysis
15 pages
B-Tree
No ratings yet
B-Tree
10 pages
5) B Tree
No ratings yet
5) B Tree
28 pages
Unit-4 Complete Notes
No ratings yet
Unit-4 Complete Notes
30 pages
Unit 3 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Data Structure - WWW - Rgpvnotes.in
18 pages
DSA-Unit 4
No ratings yet
DSA-Unit 4
49 pages
Heaps
No ratings yet
Heaps
29 pages
Question Bank Unit 1 PDF
No ratings yet
Question Bank Unit 1 PDF
27 pages
Data Structure Questions
No ratings yet
Data Structure Questions
13 pages
Unix Internals: Ms. Radha Senthilkumar, Lecturer Department of IT MIT, Chromepet Anna University, Chennai
No ratings yet
Unix Internals: Ms. Radha Senthilkumar, Lecturer Department of IT MIT, Chromepet Anna University, Chennai
60 pages
Splay Tree
No ratings yet
Splay Tree
61 pages
Studying For A Tech Interview Sucks
No ratings yet
Studying For A Tech Interview Sucks
8 pages
Red - Black Tree
No ratings yet
Red - Black Tree
12 pages
MC4101-ADSA Unit-II 2
No ratings yet
MC4101-ADSA Unit-II 2
59 pages
07 - Recursion PDF
No ratings yet
07 - Recursion PDF
28 pages
MCS-021 2022 23 Solved Assignment
No ratings yet
MCS-021 2022 23 Solved Assignment
17 pages
EX10
No ratings yet
EX10
4 pages
Lecture 12 Structures
No ratings yet
Lecture 12 Structures
37 pages
DSA-Module 2 Notes
100% (1)
DSA-Module 2 Notes
18 pages
B Tree Assignment
No ratings yet
B Tree Assignment
4 pages
Algorithms and Data Structures: Binary Search Algorithm
No ratings yet
Algorithms and Data Structures: Binary Search Algorithm
3 pages
ISAM: Indexed-Sequential-Access-Method: Adapted From Prof Joe Hellerstein's Notes
No ratings yet
ISAM: Indexed-Sequential-Access-Method: Adapted From Prof Joe Hellerstein's Notes
9 pages
Dsa Assignment: 1. Define Binary Tree
No ratings yet
Dsa Assignment: 1. Define Binary Tree
6 pages
Flat - Unit - 4 Notes
No ratings yet
Flat - Unit - 4 Notes
20 pages
Binary Search Tree
No ratings yet
Binary Search Tree
64 pages
DSA - Trees
No ratings yet
DSA - Trees
32 pages
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
No ratings yet
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
42 pages
OOPs in C++
100% (1)
OOPs in C++
45 pages
DS Unit-5
No ratings yet
DS Unit-5
50 pages
Artificial Intelligence - CS607 Handouts Lecture 9 - 10
0% (1)
Artificial Intelligence - CS607 Handouts Lecture 9 - 10
29 pages
Searching Sorting Techniques
No ratings yet
Searching Sorting Techniques
7 pages
Data Structures Viva Questions
100% (1)
Data Structures Viva Questions
8 pages
[Ebooks PDF] download Data Structures and Algorithm Analysis in C 2nd Edition China Reprint Edition Weiss full chapters
100% (10)
[Ebooks PDF] download Data Structures and Algorithm Analysis in C 2nd Edition China Reprint Edition Weiss full chapters
67 pages
Query Processing
No ratings yet
Query Processing
39 pages
C++ (CPP) Programs With Numerical Problems
No ratings yet
C++ (CPP) Programs With Numerical Problems
48 pages
K-D Trees
No ratings yet
K-D Trees
6 pages
The B-Plus Program A B-Tree Indexing File Module For C Programmers by Hunter and Associates
No ratings yet
The B-Plus Program A B-Tree Indexing File Module For C Programmers by Hunter and Associates
7 pages
Chapter 6: Multiway Trees: Not Restricted To 2 Binary Search Trees
100% (1)
Chapter 6: Multiway Trees: Not Restricted To 2 Binary Search Trees
32 pages
Dsa
No ratings yet
Dsa
21 pages
A320mh 220712 (En+ru+cn)
No ratings yet
A320mh 220712 (En+ru+cn)
82 pages
26-PS - Working With AS2 AS2 V12
No ratings yet
26-PS - Working With AS2 AS2 V12
31 pages
Project Plan Odessa Mobile Technology Project
No ratings yet
Project Plan Odessa Mobile Technology Project
22 pages
Career Bulletin 01-01-2025
No ratings yet
Career Bulletin 01-01-2025
3 pages
CSCU-Module-02-Securing-Operating-Systems-PART 1
No ratings yet
CSCU-Module-02-Securing-Operating-Systems-PART 1
40 pages
Fitzwilliam - Majority
No ratings yet
Fitzwilliam - Majority
10 pages
International Journal of Mining Science and Technology
No ratings yet
International Journal of Mining Science and Technology
12 pages
Chapter 1 Pretest
No ratings yet
Chapter 1 Pretest
4 pages
Viboto P Jakha (MCA)
No ratings yet
Viboto P Jakha (MCA)
1 page
The NFT Revolution: NFT, DAO, and Smart Contracts: 2021 Buzzwords 101
No ratings yet
The NFT Revolution: NFT, DAO, and Smart Contracts: 2021 Buzzwords 101
4 pages
III Unit Fds
No ratings yet
III Unit Fds
24 pages
Unit-2 Data Structures Introductions-1
No ratings yet
Unit-2 Data Structures Introductions-1
16 pages
HDCVI 6.0 PLUS-Leading The Over-Coax AI Evolution
No ratings yet
HDCVI 6.0 PLUS-Leading The Over-Coax AI Evolution
63 pages
Adobe Scan Feb 26, 2024
No ratings yet
Adobe Scan Feb 26, 2024
9 pages
Input: Fishbone Diagram Template
No ratings yet
Input: Fishbone Diagram Template
4 pages
Skillsheet-15A - Cambridge VCE Further Mathematics
No ratings yet
Skillsheet-15A - Cambridge VCE Further Mathematics
2 pages
367151top Guidelines of Ggbook Online Casino
No ratings yet
367151top Guidelines of Ggbook Online Casino
1 page
Eliminar Virus Topi
No ratings yet
Eliminar Virus Topi
18 pages
Quiz Answers
No ratings yet
Quiz Answers
5 pages
Chapter 4 Digital Logic
No ratings yet
Chapter 4 Digital Logic
27 pages
Brevo-Media-Kit-Corp-Deck - 17
No ratings yet
Brevo-Media-Kit-Corp-Deck - 17
17 pages
Uday Web
No ratings yet
Uday Web
23 pages
Project-K
No ratings yet
Project-K
3 pages
Where can buy Web Typography A handbook for designing beautiful and effective responsive typography Richard Rutter ebook with cheap price
100% (2)
Where can buy Web Typography A handbook for designing beautiful and effective responsive typography Richard Rutter ebook with cheap price
55 pages
Mean Stack Sample Resume 3
No ratings yet
Mean Stack Sample Resume 3
4 pages
VIBES Overview Brochure
No ratings yet
VIBES Overview Brochure
8 pages
Testing Notes JIRA v1.1
No ratings yet
Testing Notes JIRA v1.1
7 pages
Batch Apex Example in Salesforce
No ratings yet
Batch Apex Example in Salesforce
6 pages
Ipv4 - Address Classes
No ratings yet
Ipv4 - Address Classes
2 pages
ESL Brains - A Whole New Universe
No ratings yet
ESL Brains - A Whole New Universe
23 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

B-Trees: Balanced Tree Data Structures

Uploaded by

B-Trees: Balanced Tree Data Structures

Uploaded by

B-Trees: Balanced Tree Data Structures

VII. Useful Links

Tree structures support various basic dynamic set operations including

Unlike a binary-tree, each node of a b-tree may have a variable number of

The search operation on a b-tree is analogous to a search on a binary tree.

The B-Tree-Create operation creates an empty b-tree by allocating a new

If is node becomes "too full," it is necessary to perform a split operation.

Since each access to a node may correspond to a costly disk access, it is

Searching a B-Tree for Key 21

Inserting Key 33 into a B-Tree (w/ Split)

A database is a collection of data organized in a fashion that facilitates

Because computers excel at quickly and accurately manipulating, storing,

It is not uncommon for a database to contain millions of records requiring

Concurrent Access to B-Trees

Databases typically run in multiuser environments where many users can

A b-tree suffers from similar problems in a multiuser environment. If two or

The simplest solution is to serialize access to the data structure. In other

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.