A Fon Trees: .Xerox Palo Alto Research Center, Carnegie-Afellon University Brown University
A Fon Trees: .Xerox Palo Alto Research Center, Carnegie-Afellon University Brown University
the way down towards a leaf. As we will see, this has a number of
ABSTUACT significant advantages ovcr the older methods. We shall cxamine a
numhcr of variations on a common theme and exhibit full
I() this paper we present a uniform framework for the implementation implementations which are notable for their brcvity. One
and study of halanced tree algorithms. \Ve show how to imhcd in this imp1cn1entation is exatnined carefully, and some properties about its
framework the best known halanced tree tecilIliques and thell usc the behavior are proved.
framework to deVl'lop new al1~orithJlls which perform the update and
rebalancing in one pass, Oil the way down towards a leaf. \Ve ]n both sections 1 and 2 particular attention is paid to practical
conclude with a study of performance issues and concurrent updating. implementation issues, and cOlnplcte impletnentations are given for
all of the itnportant algorithms. '1l1is is significant because one
O. Introduction measure under which balanced tree algorithtns can differ greatly is
the amount of code required to actually implement them.
I1alanced trees arc arnong the oldest and tnost widely used data
stnlctures for searching. These trees allow a wide variety of Section 3 deals with the analysis of the algorithlns. New results are
operations, such as search, insertion, deletion, tnerging, and splitting givcn for the worst case perfonnance, and a technique for studying
to be performed in tinK GOgN), where N denotes the size of the tree the average case is described. While no balanced tree algorithm has
[AHU], [KtJ]. (Throughout the paper 19 will denote log to the base 2.) yet satisfactorily subtnitted to an average case analysis, empirical
A number of different types of balanced trees have been proposed, results arc given which show U1at the valious algorithms differ only
and while the related algorithms are oftcn conceptually sin1ple, they slightly in perfonnance. One irllplication of this is Ulat the top-down
have proven cumbersome to irnp1cn1ent in practice. Also, the variety algorithms of section 2 can be recommended for most applications
of such trees and the lack of good analytic results describing their because of their simplicity.
performance has made it difficult to decide which is best in a given
situation. Finally, in section 4, we discuss some other properties of the trees. In
particular, a one-pass top down deletion algorithm is presented. In
In this paper we present a uniform fratnework for the addition, we consider how to decouple the balancing from the
imp1crnentation and study of balanced tree algorithrns. 'Inc updating operations and we explore parallel updating.
fratTIework deals exclusively with binary trecs which contain two
kinds of nodes: internal and external. Each internal node contains a 1. The lJnifoml Franlcwork
key (chosen frorn a linear order) and has two links to other nodes
(internal or external). External nodes contain no keys and haye null
In this section we present a unifonn frarnework for describing
links. If such a tree is traversed in sYlnn1etlic order [Knl then the balanced trees. We show how to ernbed in this framework the nlost
internal nodes will be visited in increasing order of their keys. A
widely used balanced tree schemes, narnely B-trecs [UaMe], and AVL
second defining feature of the frarncwork is U1at it allows one bit per
trees [AVL]. In fact this ernbedding will give us interesting and novel
node, called the color of the node, to store balance infonnation. We
irnplclnentations of these two schemes.
will use red and black as the two colors. In section 1 we further
elaborate upon this dichrornatic framework and show how to imbed
We consider rebalancing transfonnations which maintain the
in it the best known balanced tree algorithms. In doing so, we will
symrnetric order of the keys and which arc local to a s1na11 portion of
discover suprising new and efficient implementations of these
the tree f()r obvious efficiency reasons. These transformations will
techniques.
changc the structure of thc tree in the salnc way as the single and
double rotations used by AVL trees [Kn]. '111c differencc between the
In section 2 we use the frarnework to develop new balanced tree
various algorithms we discuss arises in the decision of when to rotate,
algorithms which perform the update and rebalancing in one pass, on
and in the tnanipulation of the node colors.
This work was done in part while this author was a Visiting
Scientist at the Xerox Palo Alto Research Center and in part under For our first cxample, let us consider the itnp1cmentation of 2~3
support from thc NatiGfna1 Sciencc Foundation, grant no. MCS75- trees, the simplest type of B-tree. Recall that a 2-3 tree consists of 2-
23738. nodes, which have one key and t\\'o sons, 3-nodes, which have two
8
CH1397-9/78/0000-QOOS$JO.75 © 1973 IEEE
keys and three sons, and external nodes which have no keys and no
sons. Inserting a new key into a 2-3 tree involves first doing an
unsuccessful search which terminates at an external node, then
inserting the new key into the father of that node. If this is a 3-node,
it must be split into two 2-nodes, and the overflow key inserted into
its father, and so on. The "balance" in the tree comes from the fact
~ •Q ~. '\.~ .~
that all paths starting at an internal nodc and cnding at an external
node have thc same length. ""r"po"d'to
c
t d
x
procedure initialize;
x begin Z ... node(black, , ,); h'" node(black, z, z, -00); end;
d
gg procedure search and insert (integer value v; reference (node) h);
a gg begin reference (node) x~ logical success;
g
a x ... h; k(z) ... v; i'" 0; success'" true;
b
f
loop until v = k(x):
d P(i) ... x; i'" i+ 1; if v < k(x) then x ... l(x) else x ... rex) endie;
x x repeat;
c
d if x = Z
then P(i) ... x ... node(black, Z, z, v); m(z) ... red; success'" false;
if v ( k(P(i-1» then I(P(i-1» ... x else r(p(i-1» ... x endif;
loop while m(l(p(i») = m(r(p(z)) = red:
Figure 5. I,oGd balancing transfonnations m(l(p(i») ... m(r(p(i») ... black; m(p(z)'" red; i'" i- 2;
if m(p(i+ 1» = red then balance(p(i-1), P(z), p(i+ 1), p(i+ 2»
else i ... i+ 1 endif;
repeat;
endif;
m(r(h» ... black;
retum(x, success);
end "search and insert"
corresponds to
paths to cxtenlal nodes, then the nUlnber of black arcs on these two
paUls will be the same. cnlis follows from the defining property of
the trees.) Next observe that two consecutive red links never appear
Figure 7. Insertion into a 4-node
on any path. (This follows from the implementation which keeps the
red links "inside" the internal nodes.) In fact, an alternate way to
this condition will restrict the size of connected red subtrees that can
view the algorithms is that they make exactly the transformations
arise. Such conditions can often be expressed in many equivalent
necessary to maintain both these properties.
forms. For 2-3-4 trees, as we have seen, an appropriate condition is
10
~~hh~~21
K~)'-"~
~21 I I 4 I
1 3 1
procedure initialize;
begin Y ... node(red, , ,); z ... node(black,Y.y. );
P(-l) .. }z .. node(black, Z, Z, -00); end;
11
b d
~~~~
X9t ~~ pfA~38
~
~38 1 1 9 1 6 9
4 4 7
12
improvement to the standard B-tree algorilhms [HaMe]: before
splitting a node, check to see if some of the mass can be passed on
to a brother.
•
trees are rnGrc strongly balanced than AVL. We can show, for
example, that in such a tree of N leaves the longest path is at most
1.34lgN (as compared to 1.44lgN for AVL). It is, however,
cumbersome to implement these second order trees, as the
transfonnation required is not always a sitnple single or double
rotation: rather, up to nine pointcrs Inay have to be changed.
procedure search and insert (integer value v; reference (node) h); There are many variants of Program 5 which work on the same basic
begin reference (node) x. gg, g. f. logical success; theme: "on the way down the tree, if a node with two red sons is
x .. h; k(z)'" V; success" tme; encountered do a color flip, and if two red links in a row are
loop until v = k(x): encountered. balance the tree nodes they connect." A remarkable
gg ... g; g" t. f .. X; variety of different tree structures can result depending upon: (i)
if v < k(x) then x .. /(x) else x .. r(x) endit; which node involved in a transfonnation is the one causing it to
if m(l(x» = m(t(x» = red then happen, (it) which transfonnation is given preference when more
if x = z then x .. nodc(black, z, z, v); success" false; than one is applicable, and (iii) whether the application of a
if v < k(fJ then l(j) ... x else r(fJ .. x endit; transfonnation ought to disable another from happening
eadif; immediately. Program 5 corresponds to (i) having the color flip done
m(1(x» ... m(t(x» .. black; m(x) .. red: by the top node involved and the halance done by the the boltom
if m(j) =red then batance(gg, g, f, x); x.. g eDdit; node involved, (ii) ensuring that only one transfonnation is
encIif; applicable at each node, and (iii) disabling the color flip" after the
repeat; balance but enabling the balance after the color flip.
m(t(h» .. black;
retum(x, success); One tempting variant is to (i) have both the color flip and the
end "search and insert" balance done by the top node involved, (ii) still ensure that only one
transfonnation is applicable at each node, and (iii) not disable any
transformations. l1te sequence of trees produced by such an
Program 5. Top-down 2-3-4 trees.
algorithm for our ~ple keys is shown in Fig. 14. 'Ibis algorithm
corresponds to 2-3-4 trees whose 4-nodes are represented by three
nodes connected with two red links in any orientation. Fewer
baL1ncing transfonnations are involved, but the trees are less
balanced, and the algorithm is difficult to implement cleanly, since a
if m(/(x» = m(1(x» = red or In(x)= nl(/(x» = red or In(x) = trl(t(X» = red then
node may have to examine boths sons and two grandsons.
jf x = z then x .. node(black. z, z, v); success" false endie;
if In(x) = black then m(l(x» ... In(f(x» .. black; trl(X) ... red
else m(x) .. black; m(j) .. red endie; Another possibility is to change Program 6 to do a color flip after
if m(x) = black or (In(l)= red and (v < keg»~ ¢ (v < k(J)) then each balance (but then disable further balances which would involve
if v < k(j) then l(j) ... 1(x); r(x)" f, f.. g going further back up the tree). Fig. 15 shows a particularly bad
else t(j) .. lex); l(x)" f, f.. g endif sequence of keys for this variant (the initial stages are omitted). Not
endif; only is the number of possible red subtrees greatly increased, but
if v < k(j) then /(j) ... x else rlj} .. x endif; also three reds in a row could occur! (Consider, for example, what
if m(x) = black and (m(f) = red or f = g) then x .. gg endif; happens if the last tree in Fig. 15 is connected to some red node and
endif; a 12 is inserted. Then two red links arc passed up, resulting in three
reds in a row.) This example illustrates that some caution must be
exercised if good balanced trees are to be produced.
Program 6. Top-down 2-3-4 trees (alternate balancing code for Program 5). However, there is still a great deal of flexibility left in designing top-
down algorithms within this framework. As we shall sec in the next
trick is to maintain the colors properly, "fhe first part of the double section, the algorithms that we have been considering have essentially
rotation involves no color changes: both nodes involved are red the same average case perfonnance, so we should look for an
before and after the rotation. The single rotation (which is also the algorithm which is easily implemented. One goal might be to find an
second part of the double rotation) requires that the colors of the algorithm which doesn't do any "double" rotations on the way down
two nodes involved be switched. '"Il1e algorithm proceeds as follows the tree. It turns out that such an algorithm is easily derived from
when a color flip causes a node to become red, and the father of that ProgralTI 6, by simply removing all references to gg. Tbe result, given
node is also red. First, a single rotation is perfonned if necessary to as Program 7, is a method which allows two reds in a row to be
make the two red links go in the same direction. Second, the current encountered on way down the tree, but only if they are oriented in
pointer is set to its great-grandfather node. 'lbird, a single rotation is the same direction. Fig. 16 shows the operation of this algorithm on
perfonned when the two reds in a row arc encountered, to complete our sample keys. 1ne allowed connected red subtrees arc shown in
the balancing operation. This requires an extra test within the inner Fig. 17 (only one from each symmetric pair is included). TIle
loop, but the resulting code sequence is quite compact, as shown in meaning of the labels on those trees will be discussed below.
Program 6.
The example above indicates that we must be careful to prove that
It is also possible to implement AVL trees from the top down, by Program 7 operates in the way that we expect. In particular, we need
adding the "brother" transfonnation of Fig. 11 to transfonn a 4-node
14
~~~ Ks("
Afz9~z
3
I
..
I!
1
»41
Figure 14. Constructing a top-down tree (first valiant)
~
K~
~4. I! 6!
& 5 1
~ ~~ ~
X~ t ~2 3
29
8 3
8
1
9
b
P b
e
15
record node (color m; reference (node) t, r: integer k);
reference (node) h, y, z;
procedure initialize;
begin Y .... nodc(rcd, , ,): z .... node(black,y,y,); h f- l1odc(black, z, Z, -(0); end;
One is the worst-case path cost. which is the length of the longest
'I11C implerrlentation in Progr~un 7 is notable for its brevity: it
path alTIOng all trees of N keys built by the algorithm. 'Inc second,
requires only about 60% as Inuch code as the classical AVI.. and 2-3
and perhaps more representative, is the worst-case path length cost,
algorithlns. The following section shows that it can also be expected
which is the average search cost for thc tree of maximal external
to perform as well as these algorithrns in a dynalnic sense.
path length, among all trees buill by the algorithln. A 1.d finally we
f
have the average cost. which is the average search cost for a random
tree built by the algorithm, under the usual model that the Nt
3. Perfornlance Comparisons
possible pCImutations of the N keys used in building the tree are
equally likely. Note also that for a given class of trees, the average,
Since balanced trees are suitable for a wide variety of applications,
worst-case path length, and worst-case path costs fonn a non-
there are a nUInber of different measures which could be used to
decreasing scquence of nUlnbers.
compare the various algorithms we havc been discussing. In the
previous sections we have dealt with s(nne static issues such as
For a perfectly balanced tree of N keys ti1e worst-case path, worst-
program size and overhead required. In this section we shall
case path length, and average cost are all essentially 19N, so this will
concentrate on the dynamic statistics of the vaIious algorithtTIs. There
f()rnl our de jllCto standard of comp;lrison. Define the fractional cost
are essentially two costs of interest. One is the search cost, when a
to be the supretnum, as N gels large, of the ratio of the cost in
tree built by one of the algorithrns is used for searches only. 'Ine
question to IgN. '111uS the fractional worst-case path, worst-case path
other is the insertion, or balancing cost. 'Ine first measures the
length, and average cost for perfcctly balanced trees are all trivially
balance quality of the trees built by the algorithm: the second the
1. For trces produced by our algorithms, the fractional costs will be
effort consumed in achieving this balance. We have already seen
~l.
exarnplcs which supprt the intuitive notion that search cost may be
traded for insertion cost and vice versa.
'111C situation for worst-case path cost is the simplest to analyzc. It is
well-known that for AVL trees the fractional cost is l/lg<p .:::: 1.44...,
TIle dichromatic framework makes the task of comparing the
which is achieved by tile Fibonacci trees [Kn). (A Fibonacci tree of
algorithms somewhat simpler, since the properties of the binary trees
height n is constructed by putting a Fibonacci tree of height n - 2 to
produced can be studied in a color-blind manner. (As nlentioned
the left of the root and one of height n -1 to the right of the root.
above, this corresponds to explicitly counting node-internal
'[bc tree of height 0 is a single external node; the tree of height 1 is
comparisons for 2-3 and 2-3-4 trees.) In what follows, we shall
an internal node with two external sons.) For 2-3 or 2-3-4 trees a
concentrate on the cost of unsuccessful searches: the length of the
tree which is entirely 2-nodes except for one path of 3-nodes give~ a
path traversed to an external node. (lne cost of successful searches
fractional cost of 2 (which is clearly the highest passibIe). Similarly,
can be derived from this in a standard way [Kn).) In particular, we
from the comlnents in the previous section, the fractional cost for the
shall consider three different measures.
trees generated by Program 7 is 3.
16
For the fractional worst-case path length cost the situation is more Although we have not carried out the constnlction for the trees of
difficult and interesting. We have been able to improve on a number Program 7, it is reasonable to conjecture that a fractional cost of 3
of previously known bounds. A common misconception is that can be obtained.
Fibonacci trees max imize path length "unong all AVL trees of the
same size. This would be nice, since the fractional cost for Fibonacci No balanced tree algorithm has yet been completely analyzed under
trees under this metric is quite low. A Fibonacci tree of height n has the average cost metric. rlbe classical bottom up algorithms are
Fn + 2 external nodes, and its external path length is defined by the extremely difficult to analyze because they do not "preserve"
recurrence randomness: given a random tree. its subtrees arc not randoln trees.
En= En-- 1 + En- 2 + Fn+ 2, n ~ 2, On the other hand, it is possible that the top-down algorithms may
with EO = 0, E1 = 1. rIlle solution to this recurrence is submit to analysis, because they perfonn their transformations
En = 4n/5 Fn+ 1 + (3n+3)/5 Fn, blindly in a consistent fashion. However, even under the most
so the fractional path length cost for a Fibonacci tree of height n is generous randomness assumptions, the recurrenccs that arise in the
analysis seem intractable. 'Ille question of whether any of these
limsup (4nFn+1/5 +(3n+3)Fn/5)/f~+2lgFn+2) = (4/(5«p) +
algorithms are truly asynlptotic to 19N on the average is the most
3/(5(1)2)/lg(IJ = 1.04...
fundamcntal opcn question in the analysis of balanced trees.
(Recall that f~ = (/ ,n I5 / 2 rounded to the nearest integer, where (p
1
=
1
2
(5 / + 1)/2 is the "golden ratio".) Fibonacci trees are only about 4%
worse than optimal under this metric. It is possible to do a fringe analysis, of the avcrage casc behavior
assuming Ulat the rebalancing transfonnations occur only at the
"bottom" of the tree. Yao [Y] showed how to conlpute the average
However, it is possible to construct i\ VL trees which are much
nUInber of 2-nodes and 3-nodes at the bottom levels of 2-3 trees, and
worse. Given a Fibonacci tree of height n and some positive integer
Drown [Ur~] gave SOJne sitnilar results for i\ VL trees. Neither gave
k, k < fl, we can construct an "overweight" Fibonacci tree by
any results concerning path lengths. but these can be derived with
replacing the rightnlost (boltOlnlllost) Fibonacci subtree of height k
the help of the following lemma for (arbitrary) binary search trees.
by a complete binary tree with 2k external nodes. Fig. 10 shows such
a tree with n = 6 and k = 4. By appropriatcIy choosing k, we can
get a tree in which aSylnptotically all paths have the maximal LEMMA 3. Given any binary search tree with n keys, let the average
possible length. Specifically, the fractional path length for the unsuccessful search cost be Cn' rIben the average unsuccessful search
cost after a random insertion is
overweight Fibonacci tree is
Cn + 1 = C n + 2/(n+2).
limsup (H n - E k + k2 k + (n - k)(2 k - Fk + 2»
/(/t~_/~ + 2k)lg(/~-I'k+ 2k ». Proof The external path length of the tree is (n+ l)Cn. Each external
If k is chosen so that 2 k is about nFn, then this litnit equals node is equally likely to receive the insertion, with probability
l/(n+ 1). Notice that if the insertions is at level i, then the external
limsup 11 2P~/ nFnlgFll ,
which approaches l/lg(p. path length increases by i+ 2 (two new external nodes are created at
level i+ 1, less the one at level i). Therefore, the average increase in
external path length is
For 2- 3-4 trees, a silnilar construction leads to a fractional worst-case
path length of 2. rIlle situation for 2-3 trees is less deai·. Clbis 1/(n+ 1) .2: (1cvel(x)+ 2) = Cn + 2,
probleJn has been studied by Rosenberg and Snyder [RS].) We can
where the sum is taken over all external nodes X. This leads
casily upper bound this cost by 2. and and all analogous construction immediately to the recurrence
to the above yields 2-3 trees with fractional cost of 2 - 1/3lg3. We
(n+ 2)C n + 1 = (n+ I)C n + C n + 2,
start Ule construction by bulding the already considered scrawny trees which proves the lemma. I
which have maximum height for their nUITlber of leaves. (In the
sequel heights will always refer to the dichromatic framework lllis lemma has a number of interesting consequences. By
representation of these B-trees.) In the 2-3 or 2-3-4 case such trees telescoping the recurrence, we get
are clearly composed of a single path of 3-nodes with everyone else a eN = Cn + 2HN + 1 - 2Hn + 1, for N > n,
2-node. (A 2-node is also allowed at the root, if the height is odd.) where !fN denotes the N-th harmonic number [Kn]. (In particular,
rIbese scrawny trees naturally correspond to the Fibonacci trees of taking n == 0 and Co == 0 gives the well-known average unsuccessful
the AVL casco Without loss of generality, we can assume that the search cost for random trees, C N = 21fNt-I - 2, which says that the
rightmost chain is U1C one consisting of 3-nodes. To Inake these trees fractional cost for such trees is 21n2 == 1.38..., since H N = InN
overweight, we choose a k and replace the righmost scrawny tree of + 0(1 ).) If we start with a "seed" tree which is perfectly balanced,
height k by, in each case, the bushiest possible tree of height k. 'Ibis en = 19n, then we get
bushy tree consists entirely of 3-nodes in the 2-3 case, and entirely of CN == 19n + 2lg(Nln) + 0(1),
4-nodes in the 2-3-4 case. An appropriate choice of k now as a and by taking n large enough, say n = O(NllgN), then we have
function of n, the total tree height, cOJnpletes the argument. We only trees with an optilnum fractional cost,
present SOlne of the details of the 2-3 argument, as the 2-3-4 CN = IgN + O(1oglogN).
argument is somewhat simpler. 'Ille fractional worst-case path length clbis means that no balancing need be done at all if it can be
cost for the 2-3 tree just constructed is ensured that U1C tree is perfectly balanced after a a sufficient nUluber
limsup (5k3 k12 16 + (n - k)3 k12 + n2 n12 ) of keys have been inserted.
/(3 kI2 + 2X2n/2)lg(3kI2 + 2X2 nI2».
Returning to the fringe analysis, let us consider how to calculate the
We now let k = Il/lg3 + 19rz. It is then easy to check that the above
average scarch cost for 2-3 trees under the assulnption that
limsup is 2 -1/3lg3.
rcbabncing is only done at thc bottorn. Yao showed that the ratio of
17
2-nodes to 3-nodes on the bottom level is 2:1, so any particular The fringe analysis can be extended upwards by considering the set
external node belongs to a 3-node with probability 3/7 and a of possible subtrees of red height 2, etc., but these sets tend to be
rotation is done on insertion with probability 2/7, If a rotation is large, and the calculations quickly become prohibitively difficult
done for an insertion on level i, then the external path length is (For AVL trees, M. Brown has remarked that the fringe analysis does
increased by only i + 1, and Lemma 3 is easily modified to take this not seem to extend in this fashion, because the transfonnation on a
into account, with the result that the fractional average cost for such tree depends on the shape of the tree rooted at its brother.) It does
trees is 12/7 In2 = 1.188.... The result is the same for AVL and 2· appear that the fractional average cost quickly approaches 1. On the
3-4 trees. average, most of the rotations occur at the bottom levels: those
higher up present a bad worst case.
In general, each of the algorithms that we have considered has a set
of allowable connccted red subtrees, L. For each tree t in L, the The reader is again reminded that these fringe analyses rigorously
fringe analysis will give us the probability that a random insertion prove nothing about the average case performance of the algorithms
will strike one of'the external nodes of I. 'Ille average external path in the previous sections. We are unable to prove that for a given
length in a tree with N nodes ignoring rotations above the bottom algorithm (e.g., 2-3 trees) the average case behavior of the fringe
level, is variant is an upper bound on the average case behavior of the real
~PI ~ax)HN+l·
algorithm, though we conjecture this to be true. However, they can
(2. 2. be taken as analytic evidence that the algorithms perfonn very well
In this fonnula the first sum is over all I in L, and the second over on the average.
a11 external nodes x of I. In addition, Pt denotes the probability of
hitting a tree of type I, and ~ x the saving in path length due to the From a practical standpoint, simulation studies of balanced tree
rotation if one is done and 0 otherwise. (If the external node is at algorithms consistently show that the fractional average case cost is
level i this is the difference between i + 2 and the increase in the very close to 1. (See [KFSK], [Kn].) Table 1 gives the results of
external path length after the rotation.) Note that rotations at the simulations for the five implementations that we have on five
fringe always reduce the path length; however, this need not be so different files of 20,000 nodes each. The main empirical observation
for rotations higher up in the tree, In fact all the algorithms we have that can be made from this table is that on the average all Ihe
considered can be forced to do rotations that will increase the path algorithms have esselltia//y the same behavior. Furthermore, the
lengUt. This is another reason why a complete average case analysis perfonnance of all the meUtods seems to be extremely insensitive to
is non-trivial. the input. Since the external path length of a perfectly balanced
20,000 node tree is 287248, this data may be interpreted as showing
As an application, consider the fringe analysis for Program 7, the that the average-case fractional cost of these algorithms is
top-down single rotation trees. Let a, b, C, d, e, f also represent the approximately 1.02.... Unfortuantely, even for such large N, the
probabilities that a random insertion strikes an external node in a value of IgN is so small that the same data is also consistent with the
tree labelled a, b, C, d, e, f respectively if Fig, 17. (Then 2a, 3b, 4c, hypothesis that the fractional cost is 1 (or, in other words, the
4d, 5e, Sf are the probabilites that the respective trees themselves average external path length is about 19N + OJ). rlllough the
occur). For simplicity, assume that these probabilities reach steady· simulations do not help resolve this theoretical question, they do
state after a sufficient nUlnber of nodes have been inserted (Yao indicate that the trees are extremely well balanced, since they are
shows how to make this precise). Then, from Fig. 17, we can write within 2% of optimal.
down the equations
a = -2a + 4c + 3e + 3f Another point worth noting is that the insertion cost for all of the
b = 2a - 3b + 4c + 4e + 4f algorithms is very low. The number of rotations or color flips to be
c = b - 4c + e + f expected is about one every two trips down the tree. Program 7 uses
d 2b - 4d + 2e + 2f fewer rotations at the expense of a slightly less balanced tree. It is
e = 2d - Se possible to get by with even fewer rotations at the expense of more
f = 2d - Sf. imbalance: some of the variants mentioned in the previous section
We also have the nonnalizaLion condition have this property. Finally, although one might expect the top-down
algorithms to do significantly more rotations than their bottom-up
2a + 3b + '4c + 4d + 5e + Sf = 1.
counterparts, the table show this not to be the case. A direct
The solution to this set of sitnul.taneous equations (there is one comparison between the top-down and bottom-up 2-3-4 tree
redundant equation) is algorithms shows their performance statistics to be extremely similar.
a = 8/105, b = 11/105, C = 3/105,
d = 6/105, e = 2/105, f = 2/105. Since the algorithms are so similar in perfonnance, it is wise to pay
careful attention to the implementation, which can have a very
Now, the only insertions for which t>. is non-zero are the rightmost
significant effect on the perfonnance. The empirical studies show that
three in tree d. The first saves I, the other two save 2, so the average
the "inner loop" of the algorithms is the search loop, which must
ex lel11a1 path length in a tree wi th N nodes is
therefore be carefully implemented. If searches are to be done much
(2 - 5d) (HN + 1 - 1) = 12/7 (HN + 1 - 1), more often than insertions, it may be advisable to have a separate
the same as for 2-3 or AVI.. trees! (11lere are easier ways to prove search procedure, then call "search and insert" if the search was
this rcsu1t~ the intention here was to illustrate a general technique for unsuccessful. However, for most applications this is probably not
the fringe analysis of any such algorithm.) worth the trouble, since the extra overhead in the inner loop for all
the balanced tree algorithms is so small. The inner loops of the top"
down algorithms can be "unwound" so that they involve only one
18
Ex ternal path length
Program File 1 File 2 File 3 File 4 File 5
2-3 trees (Prog. 1) 292457 292725 291960 292124 292269
2-3-4 trees (Prog. 3) 293315 292680 293727 293010 292464
AVL trees (Prog. 4) 291708 292364 293479 292712 292433
top-down 2-3-4 (Prog. 5) 292816 292364 293479 292712 292433
single rotation (Prog. 7) 294422 294331 294197 294137 294753
single rotations
Program File 1 File 2 File 3 File 4 File 5
2-3 trees (Prog. 1) 12537 12569 12543 12437 12534
2-3-4 trees (Prog. 3) 11643 11571 11558 11563 11567
AVL trees (Prog. 4) 14003 13875 14035 13965 13860
top-down 2-3-4 (Prog. 5) 11852 11758 11769 11726 11783
single rotation (Prog. 7) 11365 11040 11398 11306 11209
color flips
Program File 1 File 2 File 3 File 4 File 5
2-3 trees (Prog. 1) 14912 14970 14938 14922 14848
2-3-4 trees (Prog. 3) 10280 10231 10238 10346 10280
AVL tre(,.-s (Prog. 4) 9541 9492 9532 9509 9524
top-down 2-3-4 (Prog. 5) 11419 11393 11339 11439 11380
single rotation (Prog. 7) 9931 9967 9998 9948 10022
Table 1. Empirical data for five programs on five random 20,000 node files _
more test than a straight search precedure. This compares tavorably dichromatic framework and the top·down viewpoint can lead to
with the overhead required to maintain the stack (or remember deletion methods which arc not much more complex Lhan insertion.
where to start rebalancing) for the bottom-up algorithms. lbe test for 'Ibis is illustrated by Program 8, which complctcs the deletion
Programs 6 and 7 is slightly more expensive than that for program S, opcration for 2-3-4 trees in one top-down pass. It is wetl known that
but for most applications this is probably worthwhile in view of the it suffices to consider the case that the node to be deleted is on the
simplicity of those algorithms. If search speed is essential or more bottom level (has external sons). This is accomplished by doing a
bits per node arc available, then there arc other alternatives to search for the node to be deleted, saving its position in I when it is
con~ider. For example, on some computers it might be easier to keep
encountered, and continuing until an external node is hit. Then the
the color bits with the links, rather than the nodes. This makes the father of the external node is the successor to the node to be deleted.
extra tcst in the inner loop of the top-down algorithms even simpler The deletion is completed by dcleting this father after saving it., key
to implement. in the node pointed to by I. Now, if the bottom lcvel node to be
deleted is red, it may simply be removed: the difficulty is when a
black node must be deleted. Program 8 ensures that this will never
4. Further Topics be necessary, by pushing a red down from the root to the bottom.
'Ibe transfonnations involved arc essentially those of Fig. 13 in
Balanced trees have utility in a wide variety of applications. Desides revcrsc, with two additions: 0) 3·nodcs are rotated, if necessary, so
search and insertion, many other operations are commonly required that thc red (bottom) node is traversed, and (ii) if a 2·node is
of such data structures. Some examples arc deletion, splitting, encountered which has a 3- or 4-node for a brother, a balance
concatenation, and selection. A fult discussion of these and other transfonnation is perfonned which makes thc nodc bcing traversed a
problems is given by Knuth [Kn]. l)uc to lack of space, all of these 3-nooe. 'Ibe various transfonnations are diagrammed in Fig. 18.1be
problems cannot be considered here. but rather we shall atcmpt to sequence of trees r",~ulting from deleting our sample set of keys, in
illustrate some of the machinery involved by considering in detail the reverse order from insertion, is shown in Fig. 19. Since Program 8
deletion problem. works for any 2-3·4 trec, it may be used on trees built by either
Program 3 or Programs 5 and 6. A similar algorithm is available for
For the classical balanced tree deletion algorithms, deletion is Program 7, but 2-3 trees and AVI.., trees are stilt somcwhat more
generally considered to be harder than insertion. Fortunately, the difficult to handle.
19
*~ •
M~~
R~~
M ,
2
3 7
8
9 1
28
3 9
1
3
8
9
18
9
>:<W9<
The previous paragraph raised some issues about concurrent access to 5. References
our trees. As we have already mentioned, the top-down approach
implies that inserters and readers do not interfere as long as they [AHUl Aha A., Hopcroft 1., and Ullnlan 1., The Design and
lock a small boundad context in the tree around themselves. In fact, Analysis of Computer Algorithms, Addison-\Veslcy, 1974.
it is possible to do the rebalancing in the "shadow" of the real tree,
with the result that readers are never locked out at all. In.e only [AVL] Adclson-Velskii G., Landis E., On an information
penalty is that writers will then have to lock a slightly wider context. organization algorithm, Doklady Akademica Nauk SSSR,
I)c1eters are ssomewhat more difficult to handle. The only difficulty 146 (1962), 263-266
is the dangling reference t in the middle of the tree. One then can
dUlcr lock the search path below l, or clse rotate t to Ule hottom of [ila] Bayer R., Syrnlnetric binary B-lrees: data structure and
the tree. ('Ibis can be done by a sequence of rotations which rnainlenance algorithms, Acta Infoflllatica. 1 (1972), 290-
nlaintain the defining balance properties.) Many other r31nifications 306
20
procedure delete (integcr value v; referencc (node) h); [naMc] Dayer R., and McCreight E., Organization and
begin reference (node) x, g, f, t, b, bb; lnainlenance of large ordered indices, Acta Informatica, 1
x +- h: k(z) +- -00; t +- nil; (1972), 173-179
if nz(l(I(h») = m(r(r(h») = black then m(r(h» +- red endif;
loop until x = z: [Dr1] Brown M., A partial analysis of height-balanced trees,
g +- f, f +- x; SIAM 1. Camp., (to appear)
if v < k(x) then x +- I(x): b +- r(x); bb +- l(b)
else x +- rex); b +- lex); bb +- l(b) cndif; [Br2] Brown M., A slorage scheme for height-balanced trees,
if v = k(x) then t +- x endif; IPL, (to appear)
if m(x) = m(f) = black and nl(b) = red then balance(g, f, b, bb) endif;
if m(x) = m(l(x» = m(r(x» = black then [KFSK] Karlton P.. Fuller S., Scroggs R., and Kaehler T.,
m(x) +- red; m(j) +- black; Perfonnance of height-balanced trees, CACM 19 (1976),
if m(b) = m(l(b» == rn(l(b» = black then rn(b) +- red 23-28
c1seif m(l(h» = red then balance(g, f, b, l(b»
elseif m(l(b» == red thcn balance(g, f, b, r(b» [Kn] Knuth D., The Art of Computer Progralnming, Vol. III,
endif; Sorting and Searching, Addison-Wesley, 1973
endif;
repeat; [KuLc] Kung, H.T. and Lehman, P.L., A concurrent data-base
m(r(h» ... m(z) +- black; manipulation problem: binary search trees, to appear in
if I = nil then if v < keg) then leg) +- z else r(g) +- z endif; the proceedings of the 1978 Large Data-Base
k( t) +- k(j) end if; Con ference, Berlin
retum(t);
end "delete" [RND] Reingold E., Nievergelt 1., Dca N. , Combinatorial
Algorithnls: Theory and Practice, Prentice-Hall, 1977
Program 8. Deletion for 2-3-4 trees.
[RS] Rosenberg A., and Snyder L., Minirnal-comparison 2,3-
trees, SIAM 1. Comp., (to appear)
[V] Yao A.. ()n random 2-3 trees, Acta Infonnatica, 9 (1978),
p. 159-170
21