Short Note: 410 The Computer Journal, Vol. 35, No. 4, 1992
Short Note: 410 The Computer Journal, Vol. 35, No. 4, 1992
A note on HEAPSORT A HEAPSORT algorithm starts by building procedure RESTORER, n: integer; x: element-
a heap from the elements that have to be
In The Computer Journal33 (3), 1990, Xunrang sorted. This is done by first considering the vary, t, k: integer; stop: boolean;
and Yuzhang presented a new algorithm for last node that has a child. We can regard this begin
HEAPSORT to reduce the number of com- as a heap where the children of the root are j:= 2*i; t:= 0; stop•: = false;
parisons from 2n lg n comparisons to \n lg n in heaps, but the root itself might violate the Levels: = [lg(n div i)\;
the worst casef. By recursing on their technique heap property. Restore the heap property of k:= Levels div 2;
we show how the cost can be reduced to nlgn. the heap and repeat this for all elements that while (j < n) and not stop do
+ nlglgn comparisons. Combining this with have any children level by level. This heap begin
another known technique that has the same creation will take linear time. After that, the t:=t+\;
complexity, we shall achieve the optimal sol- maximum element, which is now at the root of if field[j+ \}.key > = field\j\.key then
ution for deleting the maximum element from a the heap, is swapped with the last leaf and the y: = 7 + l ;
heap. Using this for sorting we get slightly less heap property is restored in logarithmic time. if t < = Levels-k then
than nlgn + nlg*n comparisons. The extraction of the maximum element is begin
repeated until all elements are removed in fieldy div 2].key: = field[j\.key;
decreasing order. They will now be stored in j-=2*r,
Received October 1990, revised January 1991 ascending order in the array. The crucial part end
Downloaded from http://comjnl.oxfordjournals.org/ at Purdue University Libraries ADMN on July 19, 2015
of the sorting is the restoring of the heap after else begin
1. Introduction a deletion. This is done by first swapping the if field\J\.key < = x then stop:= true
In 1964 Williams9 introduced a sorting al- largest element (the root) with the last element else begin
gorithm called HEAPSORT that sorted in in the array. The largest element is at this fieldy div 2].key:=fieldy\.key;
O(n lg n) time using only a constant amount of point not at the root of the tree, so we j-=2*j;
extra storage. This is asymptotically optimal exchange the root with its largest child. Now k:=kdiv2;
for the sorting problem if we have a the heap property might be violated one level end;
comparison-based algorithm. In fact the mini- further down in the heap, and we have to end;
mum number of comparisons to sort n determine if the element should stay or be end;
elements is n\gn + O(n), while Williams's al- swapped with its largest child. This will in the if (J = ri) and not stop then
gorithm took 3nlgn. Later in 1964, Floyd6 worst case give two comparisons on each level fieldy div 2].key: = field\J\.key
reduced the cost for heap creation so that the of the heap, which gives a total of 21gn else
HEAPSORT algorithm runs in 2nlg« com- comparisons. j : = j div 2;
parisons. This is also the version that can be In the algorithm of Xunrang and Yuzhang while j > i do
found in most text books. In 1987 Carlsson2 they used the observation that it is not if x > fieldy div 2].key then
gave an algorithm that used only n\gn + necessary to do two comparisons on every begin
nlglgH comparisons. Gonnet and Munro7 level. Instead, the children of the root are fieldy\.key: = fieldy div 2].key;
showed that the crucial delete-max operation compared, and the largest is moved up to the j : = jdi\2
can be performed in lg« + lg*« comparisons, root. This hole is filled by its largest child end
which will also affect the time for recursively k times. Then we compare the last else
HEAPSORT. lg*n is the iterative logarithm leaf with the latest promoted element. If it is i- = j \
denned as 0 if n ^ 1 and 1+lg*(lg«) otherwise. larger, we perform an insertion in the heap
In 1990 Xunrang and Yuzhang gave a fieldy\.key:= x;
above this hole by repeatedly swapping it with end; (***RESTORE***)
method to reduce the cost from 2 lg n to | lg n10. its smaller ancestors. If it is smaller, we use
This result is, of course, superseded by the Procedure HEAPSORT(field: elementarray;
Williams's algorithm for rearranging the heap n: integer);
results of Carlsson and of Gonnet and Munro. rooted at the hole on level k.
However, there are some ideas in their article vari levels, height, bound .integer;
The cost for this is first k comparisons to temp: elementtype;
that help to explain the complexity of reach level k. If the last leaf goes in the top
HEAPSORT. In this note we will exploit the begin
part of the heap (above the hole), we have at for i: = n div 2 downto 1 do
ideas of Xunrang and Yuzhang and combine most k more comparisons. If the last leaf goes
them with another idea to achieve the optimal RESTORE(i,nfield[i\.key);
in the bottom part of the heap the extra cost for i: = n—\ downto 2 do
algorithm. This presentation starts from a will be at most 2 on each level, and there are
conceptually simple idea, using comprehensive begin
\gn — k levels. The worst-case cost for each lemp:=field[i + i].key;
transformations, to achieve a fairly compli- restructuring is thus not more than the
cated algorithm. Hopefully, this will be easier field[i +\\.key: =field[\].key;
maximum of Ik and k + 2(lg n — k) = 2\gn—k, RESTORE(I, i,temp)
for the reader to understand than the original which is minimized to flgn when fc = §lg«.
papers. end;
This is clearly better than the cost for using temp: = field[l].key;
Williams's algorithm. field[\].key:=field[2].key;
The cost can be further reduced by using field[2].key:= temp;
2. Data Structure and Algorithms something better than Williams's algorithm in end; (*HEAPSORT*)
The (max-) heap is a data structure that can be the bottom part of the heap. Xunrang and
viewed as a binary tree where all levels are full, Yuzhang have devised such an algorithm, as Fig. 1. The new efficient HEAPSORT
except maybe for the last one. In the last level we showed above. This will give us a worst- algorithm with a worst case of n lg n + n lg
the leaves are stored as far to the left as case cost for the delete max of max (2k, k + lg n comparisons. The elements to be sorted
possible. The tree can be stored implicitly in f(lgn — k)). The minimum for this is fign for must be stored in an array at positions
an array, where a node at position k has its it = flg/i. Again, we can apply this new al-
gorithm, recursively, to get an even better 1-/I, where the array and n are given
children at positions Ik and 2k+\, and its
parent at [k/2\. The height of such a heap is bound. By the algorithm, where the same as parameters to HEAPSORT. In the
[lgnj. Furthermore, any node has a key value algorithm is used recursively if the last leaf is RESTORE procedure [\g{n div i)\ has to
that is at least as big as its children and not in the bottom part, a delete-max from a heap be computed. The best way of doing this
bigger than its parent. This fact makes all of height h will cost at most: is machine-dependent, and is left to the
paths from a leaf to the root into a sorted list implementer.
in which we can perform a binary search.
More details on heaps can be found in almost 1\h) = 2 if/i = 1
all introductory text books on data structures max (2k, k + T(h - k) + 1) otherwise The cost is in that case given by:
and algorithms cf.1 2 if A = 1
If we select k to be h/2 the cost to insert an
element in the top part of the heap will be less
t In this paper lg n will denote log2 n. expensive than to restructure the bottom part. !
—\-T(-)+\ otherwise
which is h + lgh + O(\). This gives a total cost random heap the average cost for a delete- References
for sorting of n lg n + n lg lg n + O(n), which is max operation will be slightly less than h + 1.3 1. A. V. Aho, J. E. Hopcroft and J. D. UU-
the same as for the algorithm by Carlsson.2 comparisons, as shown by Carlsson.3 When man, Data Structures and Algorithms.
Carlsson, however, uses a different tech- this strategy is used for sorting, Carlsson also Addison-Wesley, Reading, Mass. (1983).
nique to achieve this bound. By observing that showed results indicating an average number 2. S. Carlsson, A variant of heapsort with
each path from a leaf to the root is sorted, a of comparisons, that is, only nlgn + OAn. almost optimal number of comparisons.
binary search can be used in such a path. The Wegener showed that this algorithm has a Information Processing Letters 24 (1987),
algorithm, described in the terminology above, worst case of at most 1.5/ilgn comparisons,8 247-250.
is to let k = h but use a binary search upwards which has been proven tight by Fleischer.
to find the place to insert the last leaf. That is, 3. S. Carlsson, Average-case results on heap-
As can be noted, the worst case for sorting sort. BIT 27 (1987), 2-17.
wefindthis special path of maximum children can be less than the sum of the worst cases for
all the way down to the leaves at a cost of h 4. S. Carlsson, An optimal algorithm for
all different sizes. This depends on the fact deleting the root of a heap. Information
and then a binary search is performed with the that not all delete-max operations can be of
last leaf, costing lg h. This gives a total cost of Processing Letters 37, 317-320 (1991).
maximal cost.
lgn + lglgn + O(l) for the operation. 5. R. Fleischer, A tight lower bound for the
worst case of Bottom-Up Heapsort. Pro-
When both of these ideas are combined, 3. Conclusion ceedings 2nd International Symposium on
find the path k steps and perform a binary Algorithms, Tapei, Taiwan, Lecture Notes
search upwards, to give a cost that is at most: In this paper we have taken the result of in Computer Science 557, Springer-
Xunrang and Yuzhang for HEAPSORT and Verlag, pp. 251-262 (1991).
if/i=
1\h) = \ improved on their ideas. It has yielded a new,
and hopefully more comprehensive, way to 6. R. W. Floyd, Algorithm 245 -Treesort 3.
y k+T(h-k)+l) otherwise describe the best algorithms already published Comm. ACM 7 (12), 701 (1964).
Downloaded from http://comjnl.oxfordjournals.org/ at Purdue University Libraries ADMN on July 19, 2015
If we select k = h — lg h the cost for searching for the delete-max operation in a heap, and 7. G. H. Gonnet and J. I. Munro, Heaps on
also for sorting using repeated deletions from heaps. SIAM Journal on Computers 15
upwards will be at most h, and thus the cost (4), 964-971 (1986).
will be given by: a heap. One of the intermediate algorithms
had a worst-case complexity of n lg « + n lg lg n 8. I. Wegener, Bottom-up-Heap Sort, a new
without using an explicit binary search. It has variant of Heap Sort beating on average
i-lg/i+7Xlg/i)+l otherwise proved to be much faster on average than the Quick Sort. Proceedings, Mathematical
best previously presented algorithm with the Foundations of Computer Science 1990,
which has a solution of h + lg* h + 0(1). This is Banska Bystrica, Czechoslovakia pp.
exactly the algorithm given by Gonnet and same worst case. They have been implemented
516-522 (1990).
Munro.7 This algorithm can be slightly refined in PASCAL on a SUN-3/80 and the new
by balancing the costs for insertion and algorithm is approximately 2.5 times faster 9. J. W. J. Williams, Algorithm 232. CACM
rebalacing,4 but this will only affect the than the old worst-case algorithm and only 7 (6) 347-348 (1964).
constant term. Gonnet and Munro also 50% slower than the best average-case al- 10. G. Xunrang and Z. Yuzhang, A new
showed that this is the optimal cost for deleting gorithm on the average (see Fig. 1). HEAPSORT algorithm and the analysis
the maximum element in a heap. of its complexity. The Computer Journal
S. CARLSSON 33 (3), 281-282 (1990).
It is interesting to note that for the average
case it is best to find the path of maximum Department of Computer Science,
children all the way down to a leaf, and then Lulea Technical University,
compare upwards using a linear search. For a S-951 87 Lulea, Sweden
Announcements
14-16 OCTOBER 1992 Incomplete information Tel: +1-213-740-4523. Fax: + 1-213-740-
Deductive databases 7285. e-mail:hull@cse.usc.edu.
ICDT 92, International conference on Database Complex objects
Theory, Berlin, Germany Distributed and heterogeneous databases As before, the proceedings of ICDT 92 will be
Active database systems published by Springer-Verlag, and will be
ICDT 92 is the successor of two series of Parallelism in databases available at the conference.
conferences on theoretical aspects of databases • Query languages
that were formed in parallel by different Updates and transactions
scientific communities in Europe. The first Database programming languages
series, known as the International Conference Concurrency control and recovery
on Database Theory, was initiated in Rome in Complexity and optimisation
1986, and continued in Bruges (1988) and Data structures and algorithms for data- 15, 16, 17 OCTOBER 1992
Paris (1990). The second series, known as the bases
Symposium on Mathematical Fundamentals Fundamentals of security and privacy European Studies Conference, Omaha,
of Database Systems, was initiated in Dresden Nebraska
in 1987, and continued in Visegrad (1989) and Plan to attend the 17th annual Euro-
Rostock (1991). The merger of these confer- pean Studies Conference, sponsored by the
ences should enhance the exchange of ideas University of Nebraska at Omaha's European
and cooperation within a unified Europe and Studies Committee and College of Continuing
between Europe and the other continents. In For further information contact: Studies. ECS 92 will be an interdisciplinary
the future, ICDT will be organised every two Joachim Biskup, ICDT 92, Institute fur Inf- meeting with sessions devoted to the scholarly
years, alternating with the more practically ormatik, Universitat Hildesheim, Samelson- exchange of information, research methodo-
oriented series of conferences on Extending platz 1, D-W-3200 Hildesheim, Germany. Tel: logies and pedagogical approaches.
Database Technology (EDBT). ICDT 92 is + 49-5121-883 730. Fax: +49-5121-860475. For more information call the University of
organised by Fachausschuss 2.5 of the e_mail: biskup@infhil.uucp (mcsun!unido!- Nebraska at Omaha - European Studies Con-
Gesellschaft fur Informatik, in cooperation ference Program Coordinators: Professor
with EATCS and ACM. infhil! biskup)
Bernard Kolasa (402) 554-3617, Professor
Patricia Kolasa (402) 554-3484; or write to:
University of Nebraska at Omaha, College of
Topics Richard Hull, ICDT 92, Computer Science Continuing Studies, UNO's Peter Kiewit
• Data models and design theory Department, University of Southern Cali- Conference Center, 1313 Farnam, Omaha,
Dependencies and constraints fornia, Los Angeles, CA 90089-0782, USA. Nebraska 68182-3061.