Binary Search Trees: 1 BST Basics
Binary Search Trees: 1 BST Basics
Adnan Aziz
1 BST basics
Based on CLRS, Ch 12.
Motivation:
• Hash tables — can perform insert, delete, lookup efficiently O(1) (on average)
What about doing searching in a heap? Min in a heap? Max in a hash table?
• Binary Search Trees support search, insert, delete, max, min, successor, predecessor
• added caveat: keys are stored at nodes, in a way so as to satisfy the BST property:
– for any node x in a BST, if y is a node in x’s left subtree, then key[y] ≤ key[x],
and if y is a node in x’s right subtree, then key[y] ≥ key[x].
Implementation — represent a node by an object with 4 data members: key, pointer to left
child, pointer to right child, pointer to parent (use NIL pointer to denote empty)
1
5 2
3
3 7
7
8
2 5 8
5
15
6 18
3 20
7 17
2 4 13
2
1.1 Query operations
The BST property allows us to print out all the keys in a BST in sorted order:
INORDER-TREE-WALK(x)
if x != NIL
then INORDER-TREE-WALK(left[x])
print key[x]
INORDER-TREE-WALK(right[x])
TREE-SEARCH(x,k)
if x = NIL or k = key[x]
then return x
if k < key[x]
then return TREE-SEARCH(left[x],k)
else return TREE-SEARCH(right[x],k)
Try—search 13, 11 in example
TREE-MINIMUM(x)
while (left[x] != NIL)
do x <- left[x]
return x
• Symmetric procedure to find max.
Try—min=2, max=20
3
1.2 Successor and Predecessor
Given a node x in a BST, sometimes want to find its “successor”
• node whose key appears immediately after x’s key in an in-order walk
Conceptually: if the right child of x is not NIL, get min of right child, otherwise examine
parent
• Tricky when x is left child of parent, need to keep going up the search tree
TREE-SUCCESSOR(x)
if right[x] != NIL
then return TREE-MINIMUM( right[x] )
y <- p[x]
while y != NIL and x = right[y]
do x <- y
y <- p[y]
• Symmetric procedure to find pred.
Try—succ 15, 6, 13, 20; pred 15, 6, 7, 2
Theorem: all operations above take time O(h) where h is the height of the tree.
1.3 Updates
1.3.1 Inserts
Insert: Given a node z with key v, left[z],right[z],p[z] all NIL, and a BST T
• update T and z so that updated tree includes the node z with BST property still true
Idea behind algorithm: begin at root, trace path downward comparing v with key at current
node.
• get to the bottom ⇒ insert node z by setting its parent to last node on path, update
parent left/right child appropriately
Refer to TREE-INSERT(T,z) in CLRS for detailed pseudo-code.
Try—insert keys 12, 1, 7, 16, 25 in the example
4
1.3.2 Deletions
1. z has no children
• “splice” z’s successor (call it y) out of the tree, and replace z’s contents with those
of y. Crucial fact: y cannot have a left child
Result holds only when there are no deletes—if there are deletes, the height will tend to
√
O( n). This is because of the asymmetry in deletion—the predecessor is always used to
replace the node. The asymmetry can be removed by alternating between the successor and
predecessor.
Question 1. Given a BST on n nodes, can you always find a BST on the same n keys having
height O(log n)?
5
• Yes, just think of a (near) complete binary tree
Question 2. Can you implement insertion and deletion so that the height of the tree is always
O(log n)?
• Yes, but this is not so obvious—performing these operations in O(log n) time is a lot
tricker.
Broadly speaking, two options to keeping BST balanced
• Store some extra information at the nodes, related to the relative balance of the children
2.1 Treaps
Randomly builts BSTs tend to be balanced ⇒ given set S randomly permute elements, then
insert in that order. (CLRS 12.4)
• In general, all elements in S are not available at start
Solution: assign a random number called the “priority” for each key (assume keys are dis-
tinct). In addition to satisfying BST property (v left/right child of u ⇒ key(v) < / >
key(u)), require that if v is child of u, then priority(v) > priority(u).
• Treaps will have the following property: If we insert x1 , . . . , xn into a treap, then the
resulting tree is the one that would have resulted if keys had been inserted in the order
of their priorities
Treap insertion is very fast—only a constant number of pointer updates. It is used in LEDA,
a widely used advanced data structure library for C++.
6
G:4 G:4
C:25
D:9
G:4 G:4
D:9 C:25
F:2
F:2
G:4
B:7 G:4
..
B:7 H:5
E:23
C:25 K:65
E:23 I:73
C:25
I:73
7 insertion
Figure 3: Treap
left rotate(x)
x y
a y right rotate(x) x c
b c a b
Figure 4: Rotations
8
A A
B C B C
T3 T4 T3 T4
T1 T2 T1 T2
new new
Figure 5: AVL: Two possible insertions that invalidate balances. There are two symmetric
cases on the right side. A is the first node on the search path from the new node to the root
where the height property fails.
A
B
B C
h+1 C
h h−1 h−1
h+1
h
T3 T4 T1
h−1
T1 T2 h−1
new T2
T3 T4
new
9
A D
B C
B A
E D E F G
C
h h h
F G h h−1 h
T4
h−1 h
T2 T3
T1 T1 T4
T2 T3 new
new
P5 For every node, all paths to descendant leaves have the same number of black nodes
Define the black height of node x, bh(x), to be the number of black nodes, not including x,
on any path from x to leaf (note that P5 guarantees that the black height is well defined).
10
26
17 41
14 21 30 47
10 16 19 23 28 38
7 12 15 20 35 39
Figure 8: RB tree: Thick circles represent black nodes; we’re ignoring the NIL leaves.
Proof: Subtree rooted at any node x has at least 2bh(x) − 1 internal nodes—this follows
directly by induction.
Then note that P4 guarantees at least half the nodes on any path from root to leaf are black
(including root). So the black height is at least h/2, hence n ≥ 2h/2 − 1; result follows from
moving the −1 to left side, and taking logs.
From the above it follows that the insert and delete operations take O(lg n) time.
However, they may leave result in violations of some of the properties P1–P5.
Properties are restored by changing colors, and updating pointers through rotations.
Conceptually rotation is simple, but code looks a mess—see page 278, CLRS.
Example of update sequence for insertion in Figure 9.
11
11
2 14
1 7 15
5 8
z
4
z and p[z] red, z’s uncle red −> recolor
11
2 z 14
1 7 15
5 8
4
z & p[z] red, z’s uncle black, z right child of p[z] −> left rotate
11
7 14
z
2 15
8
1 5
4 z & p[z] red, z left child of p[z] −> right rotate, recolor
7
z 11
2
14
8
1 5 15
4 12
• if x has a grandparent (which implies it has a parent), and both x and p(x) are left
children or both are right children, then rotate at p(p(x)) then at p(x)
• if x has a grandparent and x is a left child, and p(x) is a right child or vice versa, roate
at p(x) and then the new p(x) (which will be the old grandparent of x)
• most often, augment existing data structure with some auxiliary information
One approach to return fast rank information—balanced BST with additional size field
13
• Retrieve element with given rank
Let’s see how to compute rank information using size field; later will show how size can be
updated through inserts and deletes.
Idea: at start of each iteration of while loop, r is rank of key[x] in subtree rooted at node y
Both rank algorithms are O(h) (= O(lg n) for balanced BST).
How to preserve size field through inserts and deletes?
14
• Insert
– before rotations: simply increment size field for each node on search path
– rotations: only two nodes have field changes
– upshot: insert remains O(h)
• if i, i0 are intervals, then either they overlap, or one is to the left of the other
• store additional information: max[x], maximum value of any endpoint stored in subtree
rooted at x
15
// find node in T whose interval overlaps with interval i
INTERVAL_SEARCH( T, i )
x <- root[T]
while ( x != nil[T] ) && ( i does not overlap int[x] )
do if ( left[x] != nil[T] && ( max[left[x]] >= low[i] )
then x <- left[x]
else x <- right[x]
return x
• If tree T contains an interval that overlaps i, then there is such an interval in the
subtree rooted at T
16