0% found this document useful (0 votes)
63 views14 pages

A Fon Trees: .Xerox Palo Alto Research Center, Carnegie-Afellon University Brown University

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views14 pages

A Fon Trees: .Xerox Palo Alto Research Center, Carnegie-Afellon University Brown University

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A DIClIROlV1ATIC FUAl\lE\V()HK Fon BALANCED TREES

Leo J. Guibas Robert Sedgewick*


.Xerox Palo Alto Research Center, Program in Computer Science
Palo Alto, California, and and Brown University
Carnegie-Afellon University Providence, R. I.

the way down towards a leaf. As we will see, this has a number of
ABSTUACT significant advantages ovcr the older methods. We shall cxamine a
numhcr of variations on a common theme and exhibit full
I() this paper we present a uniform framework for the implementation implementations which are notable for their brcvity. One
and study of halanced tree algorithms. \Ve show how to imhcd in this imp1cn1entation is exatnined carefully, and some properties about its
framework the best known halanced tree tecilIliques and thell usc the behavior are proved.
framework to deVl'lop new al1~orithJlls which perform the update and
rebalancing in one pass, Oil the way down towards a leaf. \Ve ]n both sections 1 and 2 particular attention is paid to practical
conclude with a study of performance issues and concurrent updating. implementation issues, and cOlnplcte impletnentations are given for
all of the itnportant algorithms. '1l1is is significant because one
O. Introduction measure under which balanced tree algorithtns can differ greatly is
the amount of code required to actually implement them.
I1alanced trees arc arnong the oldest and tnost widely used data
stnlctures for searching. These trees allow a wide variety of Section 3 deals with the analysis of the algorithlns. New results are
operations, such as search, insertion, deletion, tnerging, and splitting givcn for the worst case perfonnance, and a technique for studying
to be performed in tinK GOgN), where N denotes the size of the tree the average case is described. While no balanced tree algorithm has
[AHU], [KtJ]. (Throughout the paper 19 will denote log to the base 2.) yet satisfactorily subtnitted to an average case analysis, empirical
A number of different types of balanced trees have been proposed, results arc given which show U1at the valious algorithms differ only
and while the related algorithms are oftcn conceptually sin1ple, they slightly in perfonnance. One irllplication of this is Ulat the top-down
have proven cumbersome to irnp1cn1ent in practice. Also, the variety algorithms of section 2 can be recommended for most applications
of such trees and the lack of good analytic results describing their because of their simplicity.
performance has made it difficult to decide which is best in a given
situation. Finally, in section 4, we discuss some other properties of the trees. In
particular, a one-pass top down deletion algorithm is presented. In
In this paper we present a uniform fratnework for the addition, we consider how to decouple the balancing from the
imp1crnentation and study of balanced tree algorithrns. 'Inc updating operations and we explore parallel updating.
fratTIework deals exclusively with binary trecs which contain two
kinds of nodes: internal and external. Each internal node contains a 1. The lJnifoml Franlcwork
key (chosen frorn a linear order) and has two links to other nodes
(internal or external). External nodes contain no keys and haye null
In this section we present a unifonn frarnework for describing
links. If such a tree is traversed in sYlnn1etlic order [Knl then the balanced trees. We show how to ernbed in this framework the nlost
internal nodes will be visited in increasing order of their keys. A
widely used balanced tree schemes, narnely B-trecs [UaMe], and AVL
second defining feature of the frarncwork is U1at it allows one bit per
trees [AVL]. In fact this ernbedding will give us interesting and novel
node, called the color of the node, to store balance infonnation. We
irnplclnentations of these two schemes.
will use red and black as the two colors. In section 1 we further
elaborate upon this dichrornatic framework and show how to imbed
We consider rebalancing transfonnations which maintain the
in it the best known balanced tree algorithms. In doing so, we will
symrnetric order of the keys and which arc local to a s1na11 portion of
discover suprising new and efficient implementations of these
the tree f()r obvious efficiency reasons. These transformations will
techniques.
changc the structure of thc tree in the salnc way as the single and
double rotations used by AVL trees [Kn]. '111c differencc between the
In section 2 we use the frarnework to develop new balanced tree
various algorithms we discuss arises in the decision of when to rotate,
algorithms which perform the update and rebalancing in one pass, on
and in the tnanipulation of the node colors.

This work was done in part while this author was a Visiting
Scientist at the Xerox Palo Alto Research Center and in part under For our first cxample, let us consider the itnp1cmentation of 2~3
support from thc NatiGfna1 Sciencc Foundation, grant no. MCS75- trees, the simplest type of B-tree. Recall that a 2-3 tree consists of 2-
23738. nodes, which have one key and t\\'o sons, 3-nodes, which have two

8
CH1397-9/78/0000-QOOS$JO.75 © 1973 IEEE
keys and three sons, and external nodes which have no keys and no
sons. Inserting a new key into a 2-3 tree involves first doing an
unsuccessful search which terminates at an external node, then
inserting the new key into the father of that node. If this is a 3-node,
it must be split into two 2-nodes, and the overflow key inserted into
its father, and so on. The "balance" in the tree comes from the fact

~ •Q ~. '\.~ .~
that all paths starting at an internal nodc and cnding at an external
node have thc same length. ""r"po"d'to

A natural way to rcpresent a 2-3 tree as a binary trce is to make the


cxplicit links of the tree black and to "binarizc" the 3-nodes by
connecting their two keys with a red link, as shown in Fig. 1. (It is
Inore convenient to draw colored links than colored nodes, so we
establish the convention that the color of a node is equivalent to the
¢ •Q 'm"',"md"o \ .} .~ .~
color of the link which points to it. In our figures heavy lincs will
indicate red links.) Note that this cOffeponds to keeping track of the
total number of key cOlnparisons involved when traversing such a
node. Figllre 3. Inser110n into a J-node

An implementation of search and insertion for 2-3 trees within this


corresponds to or framework is given in Prograrn 1. For convenience, this
iInplcmentation uses the nonnal technique of including two artificial
nodes: a "head" node (h) whose right link points to the root of the
trec, and a special node (z) which represents all external nodes. The
key being sought is first stored in z, so that the search always
Figure 1. The binarization of a 3-node tenninates successfully. If the search terminates at z, then the search
was really unsuccessful, and an insertion is perfonned. A stack is
To describe the dynalnic aspects of the iInpleInentation, it is used to keep track of the path traversed on the way down the tree.
sufficicnt to consider insertion into 2- and 3-nodes. Insertion into a (The stack could be eliIninated by remeInbcring tile last 2-node
2-node gives a 3-nnde in the obvious way, as shown in Fig. 2. encountered, but we shall see better rnetilods of doing so later.) 'Ibe
Insertion into a 3-node requires rnore work, as diagranlmed in Fig. 3. single and double rotations arc handled with a single procedure
(This diagram assumes th;lt the J-Jlodc "kans" to the right: the other called "balance", which is given in Program 2. 'Illis procedure takes
case is obviously synlmetric.) 'n1e Uuce cascs herc are quite different. as input links to four consecutive nudes down the path and changes
With respect to tile structure of the tree, the first case is a simple the structure of the tree to locally bJ]ance tile bottom three of these
inscrtion, the second is a so-called "single rotation", and the third is nodes, as shown in I~'ig. 5. (Only the two cases with f as a right link
a "double rotation ". In all three cases, tile color of the top node is arc diagralnll1ed: the cases with f as a len link arc symmetric.) Also,
eventually changed to red, with its sons and grandsons JIl black. lbis the colors of the nodes pointed to by the g and f links after the
corresponds to splitting the 3-nodc into two 2-nodes: the rcd link is structural transfoImations are interchanged. It is easily checked that if
the mcssage tllat an insertion needs to be performed into the node the f and x links arc bOtil red then this procedure performs exact.ly
abovc, using the same method. the second Jnd third transformations of Fig. 3. (The program is more
gencral than seerns necessary because we shall have occasion to use it
Fig. 4. shows the sequence of 2-3 trees tilat results when this method within several other algorithms.)
is used to insert the keys 1, 9, 2, 8, 3, 7, 4, 6, and 5 in this order into
an initially empty tree. (This sequence is a well-known example The irnplcInentatlon rrwkes obvious an extension to 2-3-4 trees,
which produces a completely unbalanced "zigzag" tree if no which also fit nicely into the dichrolnatic framework. The idea is to
balancing at all is done.) Note that inserting the 2 causes a double allow 4-nodes (nodes with three keys and four sons), which are
rotation, the 3 causes a single rotation, the 4 causes two double represented as in Fig. 6. Now splitting only has to be done upon
rotations, and the 5 causes a single rotation. insertion illto a 4-node, and the split corresponds to simply
cornplemcnt111g the colors of the tllfee nodes involved, as shown in
Fig. 7. (We shall refer to this operation as a "color flip".) Insertion
corresponds to
into a 3-node may require a single or double rotation to convert it
into a 4-node: the neccssary transformations arc exactly the
transformations of Fig. 3 without the color flips. Fig. 8 shows the
sequence of 2- 3-4 trees corresponding to Fig. 4. '1l1C irnplementation
of this metil0d corresponds to rnoving the test for "balance" out of
the splitting loop, as shown in Program 3. Inscrtion into a 2-3-4 tree
corresponds to involves a sequence of color flips and at most one rotation: insertion
into a 2-3 trce Inay involve many rotations. 'll1ese trees have been
called "sylnmetric binary D-trees" by R. J3ayer [8a].

Let us now rr:ake a few observations about these implementations.


First, notc tilat if we start with an internal node and follow any two
Figure 2. Insertion into a 2-node
9
~O"G~~28
X>('y~ 3
1
9
A
1
2
3 1
8
9

Figure 4. Constructing a 2-3 tree

gg gg record node (color rn~ reference (node) I, r; integer k):


a a reference (node) h, z;
g
reference (node) array p(0: :50); integer i:
b

c
t d
x
procedure initialize;
x begin Z ... node(black, , ,); h'" node(black, z, z, -00); end;
d
gg procedure search and insert (integer value v; reference (node) h);
a gg begin reference (node) x~ logical success;
g
a x ... h; k(z) ... v; i'" 0; success'" true;
b
f
loop until v = k(x):
d P(i) ... x; i'" i+ 1; if v < k(x) then x ... l(x) else x ... rex) endie;
x x repeat;
c
d if x = Z
then P(i) ... x ... node(black, Z, z, v); m(z) ... red; success'" false;
if v ( k(P(i-1» then I(P(i-1» ... x else r(p(i-1» ... x endif;
loop while m(l(p(i») = m(r(p(z)) = red:
Figure 5. I,oGd balancing transfonnations m(l(p(i») ... m(r(p(i») ... black; m(p(z)'" red; i'" i- 2;
if m(p(i+ 1» = red then balance(p(i-1), P(z), p(i+ 1), p(i+ 2»
else i ... i+ 1 endif;
repeat;
endif;
m(r(h» ... black;
retum(x, success);
end "search and insert"
corresponds to

Program 1. 2-3 trees.

Figure 6. The binarization of a 4-node

paths to cxtenlal nodes, then the nUlnber of black arcs on these two
paUls will be the same. cnlis follows from the defining property of
the trees.) Next observe that two consecutive red links never appear
Figure 7. Insertion into a 4-node
on any path. (This follows from the implementation which keeps the
red links "inside" the internal nodes.) In fact, an alternate way to
this condition will restrict the size of connected red subtrees that can
view the algorithms is that they make exactly the transformations
arise. Such conditions can often be expressed in many equivalent
necessary to maintain both these properties.
forms. For 2-3-4 trees, as we have seen, an appropriate condition is

A variety of balanced tree algorithms can be described in tenns of


3. (2-3-4) No path from an internal node to an external node
these simple transformations on dichromatic binary trees. In general,
contains two red links in a row.
we shall consider families of trees which satisfy the following two
conditions:
Another way to express this condition is

1. External nodes are black~ internal nodes may be red or black.


3'. (2-3-4) rIlle only allowed connected red subtrees are those
shown in Fig. 9.
2. For each internal node, all paths from it to external nodes
contain the same number of black arcs.
It is straightforward to check that any binary tree satisfying
There will further be a third condition, depending on the family of conditions 1, 2, and 3 (or 3') can be uniquely ltdecoded" into a 2-3.4
trees we consider, which expresses the "balance" property. In essence tree. (However, not all such trees can be produced by Program 3.)

10
~~hh~~21
K~)'-"~
~21 I I 4 I
1 3 1

Figure 8. Constructing a 2~3~4 tree

procedure initialize;
begin Y ... node(red, , ,); z ... node(black,Y.y. );
P(-l) .. }z .. node(black, Z, Z, -00); end;

Figure 9. The allowed red subtrees in a 2-3-4 tree

procedure balance (reference (node) value result gg, g, f, x);


loop while m(l(p(r))= m(r(p(z))= red:
m(l(p(l)) .. m(r{p(z)) ... black; m(p(l)" red; i" i - 2;
begin
repeat;
if (v < k(g» *(v < k(f) if m(p(i+l» = m(p(i+2» then balance(p(i~l), p(t), p(i+l), p(/+2» eDdif;
then if v < k(j} then l(f) ... 1{x); 1{x) ... f else r(f) ... l(x); l(x) ... f eDdif;
f'" x;
endif;
if v < k(g) then l(g) ... r(j); r(f) ... g else r{g) ... l(j}; l(f) .. g endif;
g ... f, m(j) ++ m(g);
jf v < k(gg) then l(gg) ... g else r(gg) ... g endif; Program 3. 2-3-4 trees (new initialization and balancing loop for Program 1).
end "balance"
it will not generally "split" nodes exactly in half, without an
Program 2. Local balancing. accompanying generalized rotation. In either case. an implied but
weaker (equivalent when m is a power of 2) condition is
Note that since no two reds can appear in a row, the "binarized" 2-
3-4 tree satisfies the property that from any node the ratio of the 3. (B-tree of order m). No path from an internal node to an
shortest to the longest path is at most 2 (take one all black path and external node contains more than LlgmJ -1 consecutive red
another alternating between red and black.) From this it follows links.
immediately that if the tree has N internal nodes, then the length of
the longest path is O(lgN) (in fact ~ 21gN), and so a search or The usual implementation of n-trees involves storing nodes as sorted
insertion takes logarithmic time in the worst case. All of the lists of keys in separate pages. This may be characterized as an
algorithms that we discuss have this property. implementation which avoids explicitly storing red links. (In fact the
"perfectly balanced" condition beconlcs irrelevant in such
For another example. consider the extension of the above properties itnplementations. since for each node size, only one connected red
to handle general B-trees. An appropriate "balance condition" is: subtree is allowed, that whose shap,,~ is determined by the search
strategy within the page.)
3'. (B-tree of order m). The only allowed connected red subtrees are
perfectly balanced trees with Lm/2 J -1 to m -1 nodes. The above characterization of n-trees in the dichromatic framework
requires balancing both to limit the size of the connected red
Yne implementation of a D-tree of order m involves representing a subtrees which arise and to keep those subtrees locally balanced. An
node with x sons by a perfectly balanced subtree of x leaves. As in alternative is to not require local balancing at all. 11lis leads to fewer
the 2-3-4 case, insertions into such a subtree will sometimes require transfonnations, but ones which are more complex. Also, the
generalized rotations to keep it perfectly balanced. When the red resulting trees are less balanced. For the 2-3-4 case, this corresponds
subtree grows to have m+ 1 leaves, it "splits" by perfonning a color to doing simple insertions to 3-nodes (thus allowing 4-nodcs to be
flip at the root, thus giving rise to two smaller red subtrees. We have represented in three different ways, two of which have two red arcs
not yet precisely defined the teon "perfectly balanced", and this in a row), and doing the local balance upon insertion into the 4-
leaves some flexibility in the implementation. If we .defined a node.
"perfectly balanced" tree as one in which, for each node, the number
of nodes in the left subtree differs by at most one from the number It is much more surprising that AVL trees can also be embedded in
of nodes in the right subtree. then we get the standard B-tree our dichromatic framework. These are trees in which the heights of
implementation. Another plausible definition is to calt a tree the subtrees rooted at the sons of each node differ by at most one.
"perfectly balanced" if all its external nodes appear on the bottom This balance condition appears, at first sight, to be of a quite
two levels. This way requires fewer transfonnations to maintain, but

11
b d

Figure 10. An AVL tree colored as a 2-3-4 tree

Figure 11. The 1\ VL rotation

~~~~
X9t ~~ pfA~38
~
~38 1 1 9 1 6 9

4 4 7

Figure 12. Constructing an A VL tree

different nature than those we have been consldenng. Bayer lIlaJ


essentially showed that every AVL tree is a 2-3-4 tree: in the
dichromatic notation this can be described quite succintly. Define the procedure initialize;
height of a node to be the length of the longest path fronl that node begin y +- node(red, , ,); z +- node(black,y,y, );
to an external node. To nlake an A VL tree into a 2-3-4 tree simply P(-3) +- P(-2) +- P(-1) +- h +- node(black, z, z, -00); end;
color red exactly those links which go from a node at an even. height
to a node at an odd height. Fig. 10 shows how a Fibonacci tree [Kn]
looks after coloring. Now, condition 3 above for 2-3-4 trees follows
immediately. Condition 2 can easily be proven by induction on the
height of the tree. using the stronger hypothesis that avery node of
height h has exactly Lh/2J black links on every path to an external loop while m(l(p(i») = nl(r(p(i») = red:
node. (Not every A VL tree can be viewed as a 2-3 tree, however.) m(l(p(i») +- m(r(p(i») +- black~
m(p(i» +- red; i +- i- 2;
if v < k(P(i-1) then b +- 1(P(i-1) else b +- l(P(i-l» eDdif;
Observe that a 2-node corresponds to a node at an odd height whose if m(b) = m(l(b» = m(l(b» = black and m(l(p(i» = m(r(p(i») = red
father is at an odd height (2 greater); a 3-node corresponds to a node then nz(p(i+ 1» +- black: m(b) +- red; i +- i-I eDdif;
at an even height with one son at an odd height (1 less) and one son repeat;
at an even height (2 less): and a 4-node corresponds to a node at an if m(p(i+l» = rn(p(i+2» then balance(p(i-l), P(i), p(i+l), p(i+2» endif;
even height whose sons are both at an odd height (1 less).
Conversely we note U1at a node with two red sons is balanced; a
node with one red son and one black is heavy (in the A VL sense)
towards the red son, and a node with both black sons has a balance
factor analogously determined by the colors of its grandsons. We Program 4. A VL trees (new initialization and balancing loop for Program 1).
need go no further, as not all grandsons can be black. (Why?)

subtree below has increased in height by 1 (from even to odd) and


With this corrcspondance in mind, we can transfonn our algorithm
that the height of its brother is equal or one less. With this extra
for 2-3-4 tree insertion into an algorithm for A VL tree insertion by
condition the transformations of Figs. 2 and 3 suffice to terminate
the addition of a simple test. Insertion into a 4-node implies that the
tile insertion when a 2-nodc or 3-node is encountered. Program 4
height of the t w() nodes on the insertion path is incremented by 1:
gives an implementation for A VL trees based on these comments. (It
the color nip of Fig. 7 is precisely the necessary transfonnalion. Let
makes use of the fact that a 4-node has height one greater than its
us call the node newly painted red x. To maintain the AVL property,
brother if and only if both its brother and its father arc black.) The
it is necessary to check Ule brother of x. If the brother's height is
tree sequence for our sample insertions is given in Fig. 12.
now two less that that of x, a rebalancing transformation is necessary.
It turns out that one local balancing transformation, a single or
Another way to view this implementation is that the new statement
double rotation as in Fig. 5, involving the three nodes above x
checks to see if an overflowing 4-node has a brother which is a 2-
(along with two color changes) suffices to tenninate the insertion.
node. In that case we do a rotation instead of a color flip and
This transformation is shown in Fig. 11. (Only one case is shown~ the
terminate the algorithm. After the rotation we have a 4-node and a
other three are similar.) On the other hand, if the height of the
3-node, where before we had a 2-node and an overflowing 4-node.
brother of x is now equal to or less than that of x, then we proceed
lois is exactly one instance of Bayer and McCreight's suggested
as before: the red link from the color flip is the message that the

12
improvement to the standard B-tree algorilhms [HaMe]: before
splitting a node, check to see if some of the mass can be passed on
to a brother.

If we implement the full "brother" heuristic, that is also transform an


overflowing 4-node with a brother which is a 3-node into two 4-
nodes, then we obtain a different kind of tree, which might be called
a second order AVL tree. 'Ibese trees satisfy the following stronger

height condition: If we consider the four subtrees at depth two below
any node, then the heights of these subtrees are within one of each
other (for AVL they could differ by as much as two). Thus these


trees are rnGrc strongly balanced than AVL. We can show, for
example, that in such a tree of N leaves the longest path is at most
1.34lgN (as compared to 1.44lgN for AVL). It is, however,
cumbersome to implement these second order trees, as the
transfonnation required is not always a sitnple single or double
rotation: rather, up to nine pointcrs Inay have to be changed.

r111e standard implementation of AVL trecs involves keeping two bits


per node to encode the three cases (i) the node is balanced, (ii) the
len suhtree is one deeper, and (iii) the right subtree is one deeper.
Figure 13. Top-down 2-3-4 tree transformations
The necessity of maintaining 'two bits per node has been viewed as
disadvantage, and some researchers have dealt with Illodifying the '111e benefits of rebalancing on the way down will becolne more
basic properties of the trees in order to implement them with one apparent in subsequent sections where we discuss perfolmance issues.
one bit per node [Ko], [KK). Prognun 4 gives a direct impIcrnentation !-."or the rnoment suffice it to mention that we can at least hope for
using only bit per node. f\.1. Drown [Ur2] has remarked that this can code which is simple, efficient, and elegant since only one loop is
also be done in a more straightforward way, by pushing the two bits necessary. Top-down schemes will also have inhererent advantages
of the balance factor down, one to each son. This corresponds to an for parallel updating, as each writer will need to lock only a bounded
alternative coloring of the trees: a node is rnarked red if its height is context around itself in the tree.
one grcater than that of its brother. In this coloring, if red links are
given weight 2 (and black links weight 1), then, from any internal Perhaps the easiest such algorithm to explain is a top-down insertion
node, all paths to cxtcrnal nodes have the same weight. (In our algorithnl for 2-3-4 trees. Such an algoritllffi can be build out of
framework, red links are given weight 0.) It is also possible to color exactly the same transfonnations that were used in the more
2-3 and 2-3-4 trees with black and (double-weight) red links to give a traditional bottorn-up implernentation presented in the previous
constant weighted path length from each node: color both sons of section. The general idea is quite easy to explain, even for a general
each 2-node and the "upper" son of each 3-node red. 'This leads to B-tree. As we go down a path, we split an encountered node if it is
an altcrnate dichromatic framework to the one we have been full, and insert the splitting key intu the t~lther. Note that the father
discussing. We have choscn to use zero weight links because the cannot itself be full, so the splitting will not propagate.
algorithrns appear to be somewhat simpler.
Fig. 13 shows the transfonnations involved for the 2-3-4 case: a 2-
All of the algorithms described here have two features which make node attached to a 4-node becomes a J-node attachcd to two 2-
them cumbersome to implement. First, there are two loops: one nodes, and a 3-node attached to a 4-node bccorTles a 4-node attached
controlling the search (going down the tree), and one controlling the to two 2-nodes. r1l1e transt()nnations required f(x the colored binary
insertion (normally going up the tree). Second, tl1e code for the tree arc exactly those of Figs. 7, 4, and 2.
balance procedurc is rather cumbersome as it has to handle the four
cases of left and right single and double rotations. In the next i\ n imp1cmentation for 2- 3-4 trees with rebalancing done on the way
section, we will see new algorithrns which avoid both of these down is given in Program 5. It is interesting to conI pare this
difficulties. implemcntation with the standard bottom up implementation of
Progranl J. Each docs a color flip when the currcnt node's sons are
2. Top· Down Algorithms
both red ;lIld thcn a rol;ltioll if the current node's f~lther is also red.
The top-down itllplcmcntatioll manages to perf()rtll all the necessary
rl11c new atgorithrns, which also are conveniently ernbeddec1 within
transformations on the way down the tree. In order top perforrn the
the dichromatic frarncwork, are based on the COlnrnon therne that the
rotations, it is necessary to keep hold of the great -gr(lndf~lther (gg),
rehdlancing transformations arc applied on the way down the tree
grandfather (g), and father (j) of the current node. The test for the
during dn update operation. 111l1s, when an insertion search
actual insertion has been rnoved out of the inner loop by the artifact
encounters an external node, the record being inserted can be
of making the universal external node (z) have two red sons. The
attached right there, and the operation is complete. 'llle algorithms
sequence of trees produccd for our s~lTnplc keys is that of Fig. 8. It is
need not maintain a stack, since no portion of the search path need
possible to implement single and double rotations wilh somewhnt less
be traversed again to rcstorc the balance condition. In this respect,
code than the balancing procedure of Prograrn 2. The idea is to
the algorithms arc similar to the weigh/-balanced trees introduced by
separatc the two single rotations that PrograrTl 2 docs to implement
Reingold [RNn]. Unfortunately those trees seem to require
the double rotation. Ailer tlle first rotation, the "current" node
considerably morc than onc bit of balance information per node.
pointer is set high enough in the tree to set up the nex t rotation. rIlle
13
record node (color In; reference (node) I, r, integer k); whose brother is a 2-node into a 3-node whose brother is a 3-node.
reference (node) 11., y, Z; or a 2-node whose brother is a 4-node. However, 2-3 trees are not
ea.-;Uy handled, because the splitting occurs when a node is full. not
procedure initialize; when it has overflowed. Thus a 3-node would have to split into a 2-
begin y ... nodc(rcd, • ,); z'" nodc(black,y,y,); h ... nodc(b1ack, z. z, -(0); end; node and a I-node, which leads to obvious complications.

procedure search and insert (integer value v; reference (node) h); There are many variants of Program 5 which work on the same basic
begin reference (node) x. gg, g. f. logical success; theme: "on the way down the tree, if a node with two red sons is
x .. h; k(z)'" V; success" tme; encountered do a color flip, and if two red links in a row are
loop until v = k(x): encountered. balance the tree nodes they connect." A remarkable
gg ... g; g" t. f .. X; variety of different tree structures can result depending upon: (i)
if v < k(x) then x .. /(x) else x .. r(x) endit; which node involved in a transfonnation is the one causing it to
if m(l(x» = m(t(x» = red then happen, (it) which transfonnation is given preference when more
if x = z then x .. nodc(black, z, z, v); success" false; than one is applicable, and (iii) whether the application of a
if v < k(fJ then l(j) ... x else r(fJ .. x endit; transfonnation ought to disable another from happening
eadif; immediately. Program 5 corresponds to (i) having the color flip done
m(1(x» ... m(t(x» .. black; m(x) .. red: by the top node involved and the halance done by the the boltom
if m(j) =red then batance(gg, g, f, x); x.. g eDdit; node involved, (ii) ensuring that only one transfonnation is
encIif; applicable at each node, and (iii) disabling the color flip" after the
repeat; balance but enabling the balance after the color flip.
m(t(h» .. black;
retum(x, success); One tempting variant is to (i) have both the color flip and the
end "search and insert" balance done by the top node involved, (ii) still ensure that only one
transfonnation is applicable at each node, and (iii) not disable any
transformations. l1te sequence of trees produced by such an
Program 5. Top-down 2-3-4 trees.
algorithm for our ~ple keys is shown in Fig. 14. 'Ibis algorithm
corresponds to 2-3-4 trees whose 4-nodes are represented by three
nodes connected with two red links in any orientation. Fewer
baL1ncing transfonnations are involved, but the trees are less
balanced, and the algorithm is difficult to implement cleanly, since a
if m(/(x» = m(1(x» = red or In(x)= nl(/(x» = red or In(x) = trl(t(X» = red then
node may have to examine boths sons and two grandsons.
jf x = z then x .. node(black. z, z, v); success" false endie;
if In(x) = black then m(l(x» ... In(f(x» .. black; trl(X) ... red
else m(x) .. black; m(j) .. red endie; Another possibility is to change Program 6 to do a color flip after
if m(x) = black or (In(l)= red and (v < keg»~ ¢ (v < k(J)) then each balance (but then disable further balances which would involve
if v < k(j) then l(j) ... 1(x); r(x)" f, f.. g going further back up the tree). Fig. 15 shows a particularly bad
else t(j) .. lex); l(x)" f, f.. g endif sequence of keys for this variant (the initial stages are omitted). Not
endif; only is the number of possible red subtrees greatly increased, but
if v < k(j) then /(j) ... x else rlj} .. x endif; also three reds in a row could occur! (Consider, for example, what
if m(x) = black and (m(f) = red or f = g) then x .. gg endif; happens if the last tree in Fig. 15 is connected to some red node and
endif; a 12 is inserted. Then two red links arc passed up, resulting in three
reds in a row.) This example illustrates that some caution must be
exercised if good balanced trees are to be produced.

Program 6. Top-down 2-3-4 trees (alternate balancing code for Program 5). However, there is still a great deal of flexibility left in designing top-
down algorithms within this framework. As we shall sec in the next
trick is to maintain the colors properly, "fhe first part of the double section, the algorithms that we have been considering have essentially
rotation involves no color changes: both nodes involved are red the same average case perfonnance, so we should look for an
before and after the rotation. The single rotation (which is also the algorithm which is easily implemented. One goal might be to find an
second part of the double rotation) requires that the colors of the algorithm which doesn't do any "double" rotations on the way down
two nodes involved be switched. '"Il1e algorithm proceeds as follows the tree. It turns out that such an algorithm is easily derived from
when a color flip causes a node to become red, and the father of that ProgralTI 6, by simply removing all references to gg. Tbe result, given
node is also red. First, a single rotation is perfonned if necessary to as Program 7, is a method which allows two reds in a row to be
make the two red links go in the same direction. Second, the current encountered on way down the tree, but only if they are oriented in
pointer is set to its great-grandfather node. 'lbird, a single rotation is the same direction. Fig. 16 shows the operation of this algorithm on
perfonned when the two reds in a row arc encountered, to complete our sample keys. 1ne allowed connected red subtrees arc shown in
the balancing operation. This requires an extra test within the inner Fig. 17 (only one from each symmetric pair is included). TIle
loop, but the resulting code sequence is quite compact, as shown in meaning of the labels on those trees will be discussed below.
Program 6.
The example above indicates that we must be careful to prove that
It is also possible to implement AVL trees from the top down, by Program 7 operates in the way that we expect. In particular, we need
adding the "brother" transfonnation of Fig. 11 to transfonn a 4-node

14
~~~ Ks("
Afz9~z
3
I
..
I!
1

»41
Figure 14. Constructing a top-down tree (first valiant)

~
K~
~4. I! 6!
& 5 1

Figure 15. Constructing a top-down tree (second variant)

~ ~~ ~
X~ t ~2 3
29

8 3
8

1
9

Figure 16. Constructing a top-down single rotation tree

b
P b
e

Figure 17. Th~ allowed red subtrees for progranl 7.


to prove Ulat Ule Jist L of allowed connected red subtrees is "closed" (ii) if link t becomes red, then the links below it will become black,
under the insertion operation. We can think of the algorithm as a and no subsequent transfonnation during the current insertion
sequence of traversals of the trees of L, each of which may cause one will change their color;
tree in L to be tnmsformcd into another. ;\lthough each tree has a (iii) whether link t becomes red or not, any connected red subtree
black link into the root and black links at the leaves, the situation is resulting from transfonnations on the tree being traversed is in
conlplicated by the fact that the link into the root of the tree labelled L.
c may become red. From the point of view of the subtree above that
linle, one of its bottom links will bCCOITIe red. We shall refer to this LEMMA 2.2. Each of the trees in L does in fact arise.
phenomenon as "passing a red up". 11lis of course can also happen
when the insertion tenninales and an external node is replaced by a These lemmas arc easily proven by case analysis from Fig. ~7. The
new red node. 'Ibe situation is more completely described by the letters on each of the black links leaving the trees denote the trees in
following lemnlas: L that will be fonned if the subtree in question is traversed during
execution of Program 7. the subtree is exited at that black link. and
LEMMA 2.1. Suppose that a red subtree in L is traversed top-down that black link turns red.
during the execution of Program 7, and that Ule subtree is exited on
black link t. Then Ule following facts are true: Since two reds is a row are allowed, the ratio of the length of the
0) link t may become red either by insertion of a new node or longest path to the shortes path in the tree is now 3 (consider the
because it points to the root of a type c subtree; situation when the keys inserted are in increasing order). so the
length of the longest path in a tree of N nodes is bOClnded by 3 IgN.

15
record node (color m; reference (node) t, r: integer k);
reference (node) h, y, z;

procedure initialize;
begin Y .... nodc(rcd, , ,): z .... node(black,y,y,); h f- l1odc(black, z, Z, -(0); end;

procedure search and insert (integer value v; reference (node) h);


begin referellce (node) x, g,.f: logic~ll success;
x .... h; k( z) +- v; success +- true;
loop until v == k(x):
g +- f, f +- x; if v < k(x) then x .... lex) else x .... !(x) euelir;
if m~l(x» == NI('(X» == red or m(x) == m(l(x» == red or n/(x) == m(!(x» .:::: red then
If x == z then x +- node(black, z, z, v); success +- false eodif·
if rn(x) == black then m(l(x» +- nl('(x» +- black; nl(x) +- red'
else m(x) +- black; m(f) .... rcd endif;
if !tl(x) == black or (m(f) == red and (v < keg»~ -:t: (v < k(f)) then
if v < k(f} then l(f) +- rex); leX) +- f, f +- g
else 1(f) +- lex); lex) +- f, f . . g eodiC
endiC;
if v < k(j} then l(f) +- x else r(f) 4- X endif;
endif;
repeat;
m(f(h» +- black;
rctufn(x, success);
end "search and insert"

__________ ~Program 7. Top-down single rotation trees (2-3-4-5 trees).

One is the worst-case path cost. which is the length of the longest
'I11C implerrlentation in Progr~un 7 is notable for its brevity: it
path alTIOng all trees of N keys built by the algorithm. 'Inc second,
requires only about 60% as Inuch code as the classical AVI.. and 2-3
and perhaps more representative, is the worst-case path length cost,
algorithlns. The following section shows that it can also be expected
which is the average search cost for thc tree of maximal external
to perform as well as these algorithrns in a dynalnic sense.
path length, among all trees buill by the algorithln. A 1.d finally we
f

have the average cost. which is the average search cost for a random
tree built by the algorithm, under the usual model that the Nt
3. Perfornlance Comparisons
possible pCImutations of the N keys used in building the tree are
equally likely. Note also that for a given class of trees, the average,
Since balanced trees are suitable for a wide variety of applications,
worst-case path length, and worst-case path costs fonn a non-
there are a nUInber of different measures which could be used to
decreasing scquence of nUlnbers.
compare the various algorithms we havc been discussing. In the
previous sections we have dealt with s(nne static issues such as
For a perfectly balanced tree of N keys ti1e worst-case path, worst-
program size and overhead required. In this section we shall
case path length, and average cost are all essentially 19N, so this will
concentrate on the dynamic statistics of the vaIious algorithtTIs. There
f()rnl our de jllCto standard of comp;lrison. Define the fractional cost
are essentially two costs of interest. One is the search cost, when a
to be the supretnum, as N gels large, of the ratio of the cost in
tree built by one of the algorithrns is used for searches only. 'Ine
question to IgN. '111uS the fractional worst-case path, worst-case path
other is the insertion, or balancing cost. 'Ine first measures the
length, and average cost for perfcctly balanced trees are all trivially
balance quality of the trees built by the algorithm: the second the
1. For trces produced by our algorithms, the fractional costs will be
effort consumed in achieving this balance. We have already seen
~l.
exarnplcs which supprt the intuitive notion that search cost may be
traded for insertion cost and vice versa.
'111C situation for worst-case path cost is the simplest to analyzc. It is
well-known that for AVL trees the fractional cost is l/lg<p .:::: 1.44...,
TIle dichromatic framework makes the task of comparing the
which is achieved by tile Fibonacci trees [Kn). (A Fibonacci tree of
algorithms somewhat simpler, since the properties of the binary trees
height n is constructed by putting a Fibonacci tree of height n - 2 to
produced can be studied in a color-blind manner. (As nlentioned
the left of the root and one of height n -1 to the right of the root.
above, this corresponds to explicitly counting node-internal
'[bc tree of height 0 is a single external node; the tree of height 1 is
comparisons for 2-3 and 2-3-4 trees.) In what follows, we shall
an internal node with two external sons.) For 2-3 or 2-3-4 trees a
concentrate on the cost of unsuccessful searches: the length of the
tree which is entirely 2-nodes except for one path of 3-nodes give~ a
path traversed to an external node. (lne cost of successful searches
fractional cost of 2 (which is clearly the highest passibIe). Similarly,
can be derived from this in a standard way [Kn).) In particular, we
from the comlnents in the previous section, the fractional cost for the
shall consider three different measures.
trees generated by Program 7 is 3.

16
For the fractional worst-case path length cost the situation is more Although we have not carried out the constnlction for the trees of
difficult and interesting. We have been able to improve on a number Program 7, it is reasonable to conjecture that a fractional cost of 3
of previously known bounds. A common misconception is that can be obtained.
Fibonacci trees max imize path length "unong all AVL trees of the
same size. This would be nice, since the fractional cost for Fibonacci No balanced tree algorithm has yet been completely analyzed under
trees under this metric is quite low. A Fibonacci tree of height n has the average cost metric. rlbe classical bottom up algorithms are
Fn + 2 external nodes, and its external path length is defined by the extremely difficult to analyze because they do not "preserve"
recurrence randomness: given a random tree. its subtrees arc not randoln trees.
En= En-- 1 + En- 2 + Fn+ 2, n ~ 2, On the other hand, it is possible that the top-down algorithms may
with EO = 0, E1 = 1. rIlle solution to this recurrence is submit to analysis, because they perfonn their transformations
En = 4n/5 Fn+ 1 + (3n+3)/5 Fn, blindly in a consistent fashion. However, even under the most
so the fractional path length cost for a Fibonacci tree of height n is generous randomness assumptions, the recurrenccs that arise in the
analysis seem intractable. 'Ille question of whether any of these
limsup (4nFn+1/5 +(3n+3)Fn/5)/f~+2lgFn+2) = (4/(5«p) +
algorithms are truly asynlptotic to 19N on the average is the most
3/(5(1)2)/lg(IJ = 1.04...
fundamcntal opcn question in the analysis of balanced trees.
(Recall that f~ = (/ ,n I5 / 2 rounded to the nearest integer, where (p
1
=
1
2
(5 / + 1)/2 is the "golden ratio".) Fibonacci trees are only about 4%
worse than optimal under this metric. It is possible to do a fringe analysis, of the avcrage casc behavior
assuming Ulat the rebalancing transfonnations occur only at the
"bottom" of the tree. Yao [Y] showed how to conlpute the average
However, it is possible to construct i\ VL trees which are much
nUInber of 2-nodes and 3-nodes at the bottom levels of 2-3 trees, and
worse. Given a Fibonacci tree of height n and some positive integer
Drown [Ur~] gave SOJne sitnilar results for i\ VL trees. Neither gave
k, k < fl, we can construct an "overweight" Fibonacci tree by
any results concerning path lengths. but these can be derived with
replacing the rightnlost (boltOlnlllost) Fibonacci subtree of height k
the help of the following lemma for (arbitrary) binary search trees.
by a complete binary tree with 2k external nodes. Fig. 10 shows such
a tree with n = 6 and k = 4. By appropriatcIy choosing k, we can
get a tree in which aSylnptotically all paths have the maximal LEMMA 3. Given any binary search tree with n keys, let the average

possible length. Specifically, the fractional path length for the unsuccessful search cost be Cn' rIben the average unsuccessful search
cost after a random insertion is
overweight Fibonacci tree is
Cn + 1 = C n + 2/(n+2).
limsup (H n - E k + k2 k + (n - k)(2 k - Fk + 2»
/(/t~_/~ + 2k)lg(/~-I'k+ 2k ». Proof The external path length of the tree is (n+ l)Cn. Each external
If k is chosen so that 2 k is about nFn, then this litnit equals node is equally likely to receive the insertion, with probability
l/(n+ 1). Notice that if the insertions is at level i, then the external
limsup 11 2P~/ nFnlgFll ,
which approaches l/lg(p. path length increases by i+ 2 (two new external nodes are created at
level i+ 1, less the one at level i). Therefore, the average increase in
external path length is
For 2- 3-4 trees, a silnilar construction leads to a fractional worst-case
path length of 2. rIlle situation for 2-3 trees is less deai·. Clbis 1/(n+ 1) .2: (1cvel(x)+ 2) = Cn + 2,
probleJn has been studied by Rosenberg and Snyder [RS].) We can
where the sum is taken over all external nodes X. This leads
casily upper bound this cost by 2. and and all analogous construction immediately to the recurrence
to the above yields 2-3 trees with fractional cost of 2 - 1/3lg3. We
(n+ 2)C n + 1 = (n+ I)C n + C n + 2,
start Ule construction by bulding the already considered scrawny trees which proves the lemma. I
which have maximum height for their nUITlber of leaves. (In the
sequel heights will always refer to the dichromatic framework lllis lemma has a number of interesting consequences. By
representation of these B-trees.) In the 2-3 or 2-3-4 case such trees telescoping the recurrence, we get
are clearly composed of a single path of 3-nodes with everyone else a eN = Cn + 2HN + 1 - 2Hn + 1, for N > n,
2-node. (A 2-node is also allowed at the root, if the height is odd.) where !fN denotes the N-th harmonic number [Kn]. (In particular,
rIbese scrawny trees naturally correspond to the Fibonacci trees of taking n == 0 and Co == 0 gives the well-known average unsuccessful
the AVL casco Without loss of generality, we can assume that the search cost for random trees, C N = 21fNt-I - 2, which says that the
rightmost chain is U1C one consisting of 3-nodes. To Inake these trees fractional cost for such trees is 21n2 == 1.38..., since H N = InN
overweight, we choose a k and replace the righmost scrawny tree of + 0(1 ).) If we start with a "seed" tree which is perfectly balanced,
height k by, in each case, the bushiest possible tree of height k. 'Ibis en = 19n, then we get
bushy tree consists entirely of 3-nodes in the 2-3 case, and entirely of CN == 19n + 2lg(Nln) + 0(1),
4-nodes in the 2-3-4 case. An appropriate choice of k now as a and by taking n large enough, say n = O(NllgN), then we have
function of n, the total tree height, cOJnpletes the argument. We only trees with an optilnum fractional cost,
present SOlne of the details of the 2-3 argument, as the 2-3-4 CN = IgN + O(1oglogN).
argument is somewhat simpler. 'Ille fractional worst-case path length clbis means that no balancing need be done at all if it can be
cost for the 2-3 tree just constructed is ensured that U1C tree is perfectly balanced after a a sufficient nUluber
limsup (5k3 k12 16 + (n - k)3 k12 + n2 n12 ) of keys have been inserted.
/(3 kI2 + 2X2n/2)lg(3kI2 + 2X2 nI2».
Returning to the fringe analysis, let us consider how to calculate the
We now let k = Il/lg3 + 19rz. It is then easy to check that the above
average scarch cost for 2-3 trees under the assulnption that
limsup is 2 -1/3lg3.
rcbabncing is only done at thc bottorn. Yao showed that the ratio of

17
2-nodes to 3-nodes on the bottom level is 2:1, so any particular The fringe analysis can be extended upwards by considering the set
external node belongs to a 3-node with probability 3/7 and a of possible subtrees of red height 2, etc., but these sets tend to be
rotation is done on insertion with probability 2/7, If a rotation is large, and the calculations quickly become prohibitively difficult
done for an insertion on level i, then the external path length is (For AVL trees, M. Brown has remarked that the fringe analysis does
increased by only i + 1, and Lemma 3 is easily modified to take this not seem to extend in this fashion, because the transfonnation on a
into account, with the result that the fractional average cost for such tree depends on the shape of the tree rooted at its brother.) It does
trees is 12/7 In2 = 1.188.... The result is the same for AVL and 2· appear that the fractional average cost quickly approaches 1. On the
3-4 trees. average, most of the rotations occur at the bottom levels: those
higher up present a bad worst case.
In general, each of the algorithms that we have considered has a set
of allowable connccted red subtrees, L. For each tree t in L, the The reader is again reminded that these fringe analyses rigorously
fringe analysis will give us the probability that a random insertion prove nothing about the average case performance of the algorithms
will strike one of'the external nodes of I. 'Ille average external path in the previous sections. We are unable to prove that for a given
length in a tree with N nodes ignoring rotations above the bottom algorithm (e.g., 2-3 trees) the average case behavior of the fringe
level, is variant is an upper bound on the average case behavior of the real
~PI ~ax)HN+l·
algorithm, though we conjecture this to be true. However, they can
(2. 2. be taken as analytic evidence that the algorithms perfonn very well
In this fonnula the first sum is over all I in L, and the second over on the average.
a11 external nodes x of I. In addition, Pt denotes the probability of
hitting a tree of type I, and ~ x the saving in path length due to the From a practical standpoint, simulation studies of balanced tree
rotation if one is done and 0 otherwise. (If the external node is at algorithms consistently show that the fractional average case cost is
level i this is the difference between i + 2 and the increase in the very close to 1. (See [KFSK], [Kn].) Table 1 gives the results of
external path length after the rotation.) Note that rotations at the simulations for the five implementations that we have on five
fringe always reduce the path length; however, this need not be so different files of 20,000 nodes each. The main empirical observation
for rotations higher up in the tree, In fact all the algorithms we have that can be made from this table is that on the average all Ihe
considered can be forced to do rotations that will increase the path algorithms have esselltia//y the same behavior. Furthermore, the
lengUt. This is another reason why a complete average case analysis perfonnance of all the meUtods seems to be extremely insensitive to
is non-trivial. the input. Since the external path length of a perfectly balanced
20,000 node tree is 287248, this data may be interpreted as showing
As an application, consider the fringe analysis for Program 7, the that the average-case fractional cost of these algorithms is
top-down single rotation trees. Let a, b, C, d, e, f also represent the approximately 1.02.... Unfortuantely, even for such large N, the
probabilities that a random insertion strikes an external node in a value of IgN is so small that the same data is also consistent with the
tree labelled a, b, C, d, e, f respectively if Fig, 17. (Then 2a, 3b, 4c, hypothesis that the fractional cost is 1 (or, in other words, the
4d, 5e, Sf are the probabilites that the respective trees themselves average external path length is about 19N + OJ). rlllough the
occur). For simplicity, assume that these probabilities reach steady· simulations do not help resolve this theoretical question, they do
state after a sufficient nUlnber of nodes have been inserted (Yao indicate that the trees are extremely well balanced, since they are
shows how to make this precise). Then, from Fig. 17, we can write within 2% of optimal.
down the equations
a = -2a + 4c + 3e + 3f Another point worth noting is that the insertion cost for all of the
b = 2a - 3b + 4c + 4e + 4f algorithms is very low. The number of rotations or color flips to be
c = b - 4c + e + f expected is about one every two trips down the tree. Program 7 uses
d 2b - 4d + 2e + 2f fewer rotations at the expense of a slightly less balanced tree. It is
e = 2d - Se possible to get by with even fewer rotations at the expense of more
f = 2d - Sf. imbalance: some of the variants mentioned in the previous section
We also have the nonnalizaLion condition have this property. Finally, although one might expect the top-down
algorithms to do significantly more rotations than their bottom-up
2a + 3b + '4c + 4d + 5e + Sf = 1.
counterparts, the table show this not to be the case. A direct
The solution to this set of sitnul.taneous equations (there is one comparison between the top-down and bottom-up 2-3-4 tree
redundant equation) is algorithms shows their performance statistics to be extremely similar.
a = 8/105, b = 11/105, C = 3/105,
d = 6/105, e = 2/105, f = 2/105. Since the algorithms are so similar in perfonnance, it is wise to pay
careful attention to the implementation, which can have a very
Now, the only insertions for which t>. is non-zero are the rightmost
significant effect on the perfonnance. The empirical studies show that
three in tree d. The first saves I, the other two save 2, so the average
the "inner loop" of the algorithms is the search loop, which must
ex lel11a1 path length in a tree wi th N nodes is
therefore be carefully implemented. If searches are to be done much
(2 - 5d) (HN + 1 - 1) = 12/7 (HN + 1 - 1), more often than insertions, it may be advisable to have a separate
the same as for 2-3 or AVI.. trees! (11lere are easier ways to prove search procedure, then call "search and insert" if the search was
this rcsu1t~ the intention here was to illustrate a general technique for unsuccessful. However, for most applications this is probably not
the fringe analysis of any such algorithm.) worth the trouble, since the extra overhead in the inner loop for all
the balanced tree algorithms is so small. The inner loops of the top"
down algorithms can be "unwound" so that they involve only one

18
Ex ternal path length
Program File 1 File 2 File 3 File 4 File 5
2-3 trees (Prog. 1) 292457 292725 291960 292124 292269
2-3-4 trees (Prog. 3) 293315 292680 293727 293010 292464
AVL trees (Prog. 4) 291708 292364 293479 292712 292433
top-down 2-3-4 (Prog. 5) 292816 292364 293479 292712 292433
single rotation (Prog. 7) 294422 294331 294197 294137 294753

single rotations
Program File 1 File 2 File 3 File 4 File 5
2-3 trees (Prog. 1) 12537 12569 12543 12437 12534
2-3-4 trees (Prog. 3) 11643 11571 11558 11563 11567
AVL trees (Prog. 4) 14003 13875 14035 13965 13860
top-down 2-3-4 (Prog. 5) 11852 11758 11769 11726 11783
single rotation (Prog. 7) 11365 11040 11398 11306 11209

color flips
Program File 1 File 2 File 3 File 4 File 5
2-3 trees (Prog. 1) 14912 14970 14938 14922 14848
2-3-4 trees (Prog. 3) 10280 10231 10238 10346 10280
AVL tre(,.-s (Prog. 4) 9541 9492 9532 9509 9524
top-down 2-3-4 (Prog. 5) 11419 11393 11339 11439 11380
single rotation (Prog. 7) 9931 9967 9998 9948 10022

Table 1. Empirical data for five programs on five random 20,000 node files _

more test than a straight search precedure. This compares tavorably dichromatic framework and the top·down viewpoint can lead to
with the overhead required to maintain the stack (or remember deletion methods which arc not much more complex Lhan insertion.
where to start rebalancing) for the bottom-up algorithms. lbe test for 'Ibis is illustrated by Program 8, which complctcs the deletion
Programs 6 and 7 is slightly more expensive than that for program S, opcration for 2-3-4 trees in one top-down pass. It is wetl known that
but for most applications this is probably worthwhile in view of the it suffices to consider the case that the node to be deleted is on the
simplicity of those algorithms. If search speed is essential or more bottom level (has external sons). This is accomplished by doing a
bits per node arc available, then there arc other alternatives to search for the node to be deleted, saving its position in I when it is
con~ider. For example, on some computers it might be easier to keep
encountered, and continuing until an external node is hit. Then the
the color bits with the links, rather than the nodes. This makes the father of the external node is the successor to the node to be deleted.
extra tcst in the inner loop of the top-down algorithms even simpler The deletion is completed by dcleting this father after saving it., key
to implement. in the node pointed to by I. Now, if the bottom lcvel node to be
deleted is red, it may simply be removed: the difficulty is when a
black node must be deleted. Program 8 ensures that this will never
4. Further Topics be necessary, by pushing a red down from the root to the bottom.
'Ibe transfonnations involved arc essentially those of Fig. 13 in
Balanced trees have utility in a wide variety of applications. Desides revcrsc, with two additions: 0) 3·nodcs are rotated, if necessary, so
search and insertion, many other operations are commonly required that thc red (bottom) node is traversed, and (ii) if a 2·node is
of such data structures. Some examples arc deletion, splitting, encountered which has a 3- or 4-node for a brother, a balance
concatenation, and selection. A fult discussion of these and other transfonnation is perfonned which makes thc nodc bcing traversed a
problems is given by Knuth [Kn]. l)uc to lack of space, all of these 3-nooe. 'Ibe various transfonnations are diagrammed in Fig. 18.1be
problems cannot be considered here. but rather we shall atcmpt to sequence of trees r",~ulting from deleting our sample set of keys, in
illustrate some of the machinery involved by considering in detail the reverse order from insertion, is shown in Fig. 19. Since Program 8
deletion problem. works for any 2-3·4 trec, it may be used on trees built by either
Program 3 or Programs 5 and 6. A similar algorithm is available for
For the classical balanced tree deletion algorithms, deletion is Program 7, but 2-3 trees and AVI.., trees are stilt somcwhat more
generally considered to be harder than insertion. Fortunately, the difficult to handle.

19
*~ •

Figure 18. Top-down 2-3-4 deletion transfonnations

M~~
R~~
M ,
2

3 7
8

9 1
28
3 9
1
3
8
9
18
9
>:<W9<

Figure 19. I)estroying a 2-3-4 tree

for parallel execution of the top-down approach remain to be


One nice feature of the dichromatic fralllework is that it allows us to explored and we hope to undertake thenl in a future report. For a
decouple the job of rnaillt~lining the tree balanced froln the discussion of similar issues sec the work of Kung and Lehtnan [KuLe].
operations of insertion and deletion. We can design a balancer which
works on the basis of local context only, without having to gather In this paper, we have exhibited a framework suitable for studying
tree-wide inf()fmation. Such a balancer traverses the tree and uses the itnplementation and performance of a variety of balanced tree
our standard tranforrnations: two reds on a path cause a rotation, a algorithms. Within this framework, we were able to develop new
black with two reds below causes a color flip. With careful traversal algoriL1uns which perfonn as well but arc significantly siInpler than
design the balancer can be shown to have the following property. [f the classical algorithms. rnle dichrolnatic fralnework not only has
we start with any red-bbck tree satisfying conditions 1. and 2. of sufficient flex ibility to aid in developing new techniques, but also it
section 1 then, aner the balancer has made O(lgN) passes over the is simple enough to perhaps lead to a cUInp1cte analysis of some
tree, the resulting tree will he balanced, in the sense of satisfying balanced tree algorithm.
condition 3 for (one of) our algorjthIns. (Note that condtions 1. and
2. allow extrenlc1y unbalanced trees, for instance ones where all Acknowledgements: The authors wish to thank Lyle Ramshaw for
internal nodes are on one red linear chain.) ~nlis implies we can run nlany helpful discussions and especially for his contributions to
the balancer asynchronously with the tree updatcrs, and if we section 3. Clark Thompson and Mark Brown offered valuable
guarantee that it receives enough cycles, then we know that the tree COJlllncnts on both the f()nn and the conlcnt of the manuscript.
will relnain well balanced. 'Ine siInphcity of the rebalancing decisions Finally, the authors wish to acknowledge the use of the MACSYMA
and transformations Inakes it attractive to consider putting such a system at MIT for checking SOlne of the calculations in section 3.
balancer into microcode and/or hardware.

The previous paragraph raised some issues about concurrent access to 5. References
our trees. As we have already mentioned, the top-down approach
implies that inserters and readers do not interfere as long as they [AHUl Aha A., Hopcroft 1., and Ullnlan 1., The Design and
lock a small boundad context in the tree around themselves. In fact, Analysis of Computer Algorithms, Addison-\Veslcy, 1974.
it is possible to do the rebalancing in the "shadow" of the real tree,
with the result that readers are never locked out at all. In.e only [AVL] Adclson-Velskii G., Landis E., On an information
penalty is that writers will then have to lock a slightly wider context. organization algorithm, Doklady Akademica Nauk SSSR,
I)c1eters are ssomewhat more difficult to handle. The only difficulty 146 (1962), 263-266
is the dangling reference t in the middle of the tree. One then can
dUlcr lock the search path below l, or clse rotate t to Ule hottom of [ila] Bayer R., Syrnlnetric binary B-lrees: data structure and
the tree. ('Ibis can be done by a sequence of rotations which rnainlenance algorithms, Acta Infoflllatica. 1 (1972), 290-
nlaintain the defining balance properties.) Many other r31nifications 306

20
procedure delete (integcr value v; referencc (node) h); [naMc] Dayer R., and McCreight E., Organization and
begin reference (node) x, g, f, t, b, bb; lnainlenance of large ordered indices, Acta Informatica, 1
x +- h: k(z) +- -00; t +- nil; (1972), 173-179
if nz(l(I(h») = m(r(r(h») = black then m(r(h» +- red endif;
loop until x = z: [Dr1] Brown M., A partial analysis of height-balanced trees,
g +- f, f +- x; SIAM 1. Camp., (to appear)
if v < k(x) then x +- I(x): b +- r(x); bb +- l(b)
else x +- rex); b +- lex); bb +- l(b) cndif; [Br2] Brown M., A slorage scheme for height-balanced trees,
if v = k(x) then t +- x endif; IPL, (to appear)
if m(x) = m(f) = black and nl(b) = red then balance(g, f, b, bb) endif;
if m(x) = m(l(x» = m(r(x» = black then [KFSK] Karlton P.. Fuller S., Scroggs R., and Kaehler T.,
m(x) +- red; m(j) +- black; Perfonnance of height-balanced trees, CACM 19 (1976),
if m(b) = m(l(b» == rn(l(b» = black then rn(b) +- red 23-28
c1seif m(l(h» = red then balance(g, f, b, l(b»
elseif m(l(b» == red thcn balance(g, f, b, r(b» [Kn] Knuth D., The Art of Computer Progralnming, Vol. III,
endif; Sorting and Searching, Addison-Wesley, 1973
endif;
repeat; [KuLc] Kung, H.T. and Lehman, P.L., A concurrent data-base
m(r(h» ... m(z) +- black; manipulation problem: binary search trees, to appear in
if I = nil then if v < keg) then leg) +- z else r(g) +- z endif; the proceedings of the 1978 Large Data-Base
k( t) +- k(j) end if; Con ference, Berlin
retum(t);
end "delete" [RND] Reingold E., Nievergelt 1., Dca N. , Combinatorial
Algorithnls: Theory and Practice, Prentice-Hall, 1977
Program 8. Deletion for 2-3-4 trees.
[RS] Rosenberg A., and Snyder L., Minirnal-comparison 2,3-
trees, SIAM 1. Comp., (to appear)

[V] Yao A.. ()n random 2-3 trees, Acta Infonnatica, 9 (1978),
p. 159-170

21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy