K Way Merge N Sort ACM SE Regl 1993
K Way Merge N Sort ACM SE Regl 1993
William A. Greene
Computer Science Department
University of New Orleans
New Orleans, LA 70148
wagcs@uno.edu
Abstract: We present a divide-and-conquer algo- topic of k-way merging has been considered
rithm for merging k sorted lists, namely, recur- before, but only lightly and in the context of
sively merge the first k ⁄ 2 lists, do likewise for external sorting ([1], [10]).
the last k ⁄ 2 lists, then merge the two results. We We present a simple divide-and-conquer algorithm
get a tight bound for the expense, in comparisons for k-way merging. The algorithm resembles the
made between list elements, of this merge. We merge sort itself: first recursively merge the first
show the algorithm is cheapest among all similar k ⁄ 2 lists, then do likewise for the last k ⁄ 2
divide-and-conquer approaches to k-way merging. lists, and finally merge the two results. We obtain
We compute the expense of the k-ary sort, which, a good tight bound on the number of comparisons
in analogy to the binary sort, divides its input list between list elements made by our divide-and-
into k sublists. Sometimes the k-ary sort has the conquer k-way merge algorithm. We show that the
same expense as the binary sort. Finally we briefly algorithm does the fewest comparisons among all
consider parallelizing these algorithms. similar divide-and-conquer approaches to k-way
merging. We compute the cost (in comparisons) of
Introduction the k-ary sort, which generalizes the binary sort by
dividing its input list into (not 2 but) k approxi-
Knuth tells us [10, p. 161] that the merge sort -- mately equal-sized sublists; sometimes the cost of
here we shall call it the binary sort -- was one of the the k-ary sort is identical to that of the binary sort.
first sorting algorithms suggested for computer Finally we briefly consider parallelizing our algo-
implementation. There has been continuing rithms.
interest in speeding up the merge operation (for
In this paper we will always sort lists into
instance, Sprugnoli [12], Carlsson [3], Thanh et al.
ascending (versus descending) order. Lists are
[13], Dudzinski and Dydek [5], Brown and Tarjan
assumed to be sequentially implemented (versus a
[2], Trabb Pardo [14], Hwang and Lin [9]) and also
linked implementation). The floor x and ceiling
interest in the speed-ups possible through paral-
x functions have their usual meanings: for a real
lelism (Cole [4], Shiloach and Vishkin [11],
number x,
Hirschberg [8], Gavril [7], Even [6]). In all the
references listed, the authors have considered the x = greatest integer i such that i ≤ x,
case of merging two sorted lists. In this paper we x = least integer i such that i ≥ x.
shall study the merging of k ≥ 2 sorted lists. The
“log” will mean base two logarithm log2; a loga-
rithm to some other base b will be explicitly
subscripted logb. When measuring the run-times of
algorithms we will consider worst-case run-times.
The binary sort achieves its good runtime by a
divide-and-conquer strategy, namely, that of (1) f( 1) = 0
halving the list being sorted: the front and back (2) f(n) = n – 1 + f( n ⁄ 2 ) + f( n ⁄ 2 )
halves of the list are recursively sorted separately,
then the two results are merged into the answer list. When n is a power of 2, the second equation
An implementation is becomes
(2′) f(n) = n − 1 + 2 f(n/2)
procedure BINARY_SORT(L: in/out List_type);
which has solution
local L1, L2: List_type;
f(n) = n log2n − n +1 , n a power of 2.
begin
if length(L) > 1 then The general solution is
L1 := L(1 .. n ⁄ 2 ); f ( n ) = n log n – 2 log n +1
L2 := L( n ⁄ 2 + 1 .. n); for arbitrary integers n ≥ 1.
BINARY_SORT(L1); [1, Ex. 9.12]. In particular, binary sort's runtime is
BINARY_SORT(L2); O(n log n).
MERGE(L1, L2, L);
end if; k-way merging
end;
Now let us generalize to an integer k ≥ 2. Let L be
The procedure MERGE( L1, L2: in List_type; a list of n elements. Divide L into k disjoint contig-
L: out List_type ) that we have in mind for sorting uous sublists L1, L2, ..., Lk of nearly equal length.
two lists is described as follows. Initialize pointers Some Li's (namely, n rem k of them, so possibly
to the first item in each list L1, L2, and then none) will have length n ⁄ k + 1 -- for reasons
repeat that will become clear later, let these have the low
compare the two items pointed at; indices: L1, L2, ... . Other Li's will have length
n ⁄ k , and are to have high indices: ..., Lk−1, Lk.
move the smaller into L;
advance the corresponding pointer to the We intend to recursively sort the Li's and then
smaller's neighbor; merge the k results into an answer list. The
expense of our k-ary sort is completely determined
until one of L1,L2 exhausts;
by the cost of merging k sorted lists. Here are three
drain the remainder of the unexhausted alternative algorithms for merging k sorted lists.
list into L; Note below that we do not assume the source lists
The basic operation (BO) we shall be counting to have approximately equal lengths.
obtain the (worst-case) cost of algorithms will be
that of comparing two list elements. If the two (1) Linear-Search-Merge: Find the smallest of k
input lists to MERGE have lengths m and n respec- items (one from each of the k sorted source lists), at
tively, then MERGE does at most m + n − 1 BO's a cost of k−1 BO's. Move the smallest into the
(the draining does comparisons, but not of list answer list and replace it by its neighbor (the next
elements). (MERGE does at least min{m, n} largest element) in the source list from which it
BO's.) If |m − n| ≤ 1 then algorithm MERGE does came. Again there are k items, from among which
as few BO's as any algorithm for merging two the smallest is to be selected. (When a list
sorted lists [Knuth, Theorem 5.3.2−M]. exhausts, the last moved item has no replacement,
so next we find the smallest of fewer than k items.)
If f(n) = the (worst-case) number of BO's done by
BINARY_SORT in sorting a list of length n, then f (2) Heap-Merge: k items (one from each sorted
satisfies source list) are maintained in a heap (under disci-
pline: root = smallest). Move the smallest item
into the answer list, replace the moved item by its When the k source lists all exhaust at nearly the
neighbor in the source list from which it came, and same time, Linear-Search-Merge performs slightly
then, with cost 2 log k BO's, re-heapify. (When a fewer than (k−1)n BO's. (The exact worst-case
list exhausts, the last moved item is not replaced, number of BO's made by Linear-Search-Merge is
and we re-heapify to a heap with fewer than k (k−1) (n − k/2), and it is a pleasant induction argu-
items.) ment on k to show this.)
(3) Divide-and-Conquer-Merge: recursively When the k source lists all exhaust at nearly the
merge the first k ⁄ 2 lists, recursively merge the same time, Heap-Merge performs approximately
last k ⁄ 2 lists, then MERGE the two results. (If 2n log k BO's.
k = 2 then just MERGE; if k = 1 then output =
input.)
Now we shall show that, if the k lists are presented
in decreasing order of length, then D&C-Merge
We shall show that Divide-and-Conquer-Merge performs at most
performs the fewest BO's among these three alter-
natives for doing k-way merging. n log k – ( n ⁄ k )2 log k +n–k+1
BO's, which is always ≤ log k BO's, so about
half as many BO's as Heap-Merge. In fairness, the
The problem of k-way merging has been studied
actual runtimes of D&C-Merge can be expected to
before, in the context of external sorting. See [1,
approximate and perhaps exceed the runtimes of
Chapter 11, especially pages 354-355]; also see
Heap-Merge. Both have other expenses besides
[10, especially section 5.4.1, pages 251-253]. In
BO's and in particular D&C-Merge has recursion
the cited references the authors in passing generally
expenses, though these can be reduced by using a
assume that Heap-Merge is used to merge k lists
stack variable to simulate recursive procedure
(we might call them short-ish) stored in central
calls. We shall re-compare Heap-Merge and
memory. But this is a minor interest to the authors,
Divide-and-Conquer-Merge in the section on
for they are mostly concerned with those problems
parallelism.
peculiar to extenal sorting, namely, minimizing
accesses of external memory such as tapes, which
amounts to the judicious building and arranging of Now to bound D&C-Merge's expense. First we
"runs" (sets of adjacent records that are in sorted need a lemma. Below, function “len” is the length
order) on k very long tapes. function.
For external sorting, Heap-Merge is the sensible
choice and Divide-and-Conquer-Merge is not.
Heap-Merge makes one sequential pass through Lemma: Let L1, L2, ..., Lk be lists, where k is odd,
each of its k source lists; for external sorting this is and suppose
appropriate. The recursive algorithm Divide-and- len(L1) ≥ len(L2) ≥ ... ≥ len(Lk).
Conquer-Merge revisits its input records; for Denote j = k ⁄ 2 (which here is (k−1)/2),
external sorting this has the undesirable effect of
increasing the number of accesses of external A = len(L1) + len(L2) + ... + len(Lj),
memory (and unless tapes can be read backwards, B = len(Lj+1) + len(Lj+2) + ... + len(Lk),
also the number of tape rewinds will increase).
n = A + B.
Notation: Let n be the sum of the lengths of the k Then
source lists. Also, D&C-Merge abbreviates the (1) B−A ≤ n/k ,
name Divide-and-Conquer-Merge.
(2) B−A = n/k if and only if all the lists have
k A log ( k ⁄ 2 ) k
the same length, A log --- – --------- 2 + A – --- + 1
2 k⁄2 2
(3) A/(k−1) + B/(k+1) ≥ n/k, with equality
holding if and only if all the lists have the same k B log ( k ⁄ 2 ) k
+ B log --- – --------- 2 + B – --- + 1
length. 2 k⁄2 2
Proof: +n−1
B−A = len(Lk) − [len(L1) − len(Lk−1)] where the first two lines are the costs of recursion
and the third line is the cost of MERGE. Next using
− [len(L2) − len(Lk−2)] − ...
the relations
− [len(Lj) − len(Lj+1)]
A + B = n,
≤ len(Lk)
log ( k ⁄ 2 ) = log k – 1 = log k – 1 ,
-- since len(Li) − len(Lk−i) ≥ 0, for i ≤ j
our sum easily simplifies to expression (1).
≤ n/k
--since the shortest list has length ≤ Case 2: k is odd. Then k ⁄ 2 = (k−1)/2, k ⁄ 2
average length. = (k+1)/2, and by induction the number of BO's is
This gives (1), and implies (2). Part (3) follows at most
from (1) and (2). k–1
k–1 A log ----------- k–1
A log ----------- – -----------2 2 + A – ----------- + 1
2 k–1 2
Theorem 1: Let L1, L2, ..., Lk be sorted lists that -----------
2
satisfy
len(L1) ≥ len(L2) ≥ ... ≥ len(Lk). k+1
log ------------
k+1 B 2 k+1
+ B log ------------ – ------------ 2 + B – ------------ + 1
Let n be the sum of their lengths. Then Divide- 2 k+1 2
and-Conquer-Merge performs at most ------------
2
(1) n log k – ( n ⁄ k )2 log k +n–k+1 +n−1
BO's in merging these lists. which simplifies to expression *E* =
Proof: The proof is by induction on k. When k = A log ( k – 1 ) + B log ( k + 1 )
2, formula (1) becomes n−1, which is correct for A log ( k – 1 ) B log ( k + 1 )
the (maximum) number of BO's performed by – ----------- 2 – ------------ 2
k–1 k+1
MERGE when it merges two lists whose lengths
sum to n. Now assume the desired bound holds +n−k+1
whenever h < k and D&C-Merge merges h lists
(whose lengths descend). Denoting j = k ⁄ 2 , we Subcase 2.1: the odd number k is not of the form
note that list set L1, L2, ..., Lj, and list set Lj+1, 1 + 2p (for some positive integer p). Then
Lj+2, ..., Lk are also in descending order of length. log ( k – 1 ) = log k = log ( k + 1 )
Let so formula *E* simplifies to
A = the sum of the lengths of the first k ⁄ 2 lists,
n log k + n – k + 1
B = the sum of the lengths of the last k ⁄ 2 lists.
There will be two cases, one of which has two – [ A ⁄ ( k – 1 ) + B ⁄ ( k + 1 ) ]2 log k
subcases.
which is less than or equal to formula (1) of the
Case 1: k is even. By induction the number of Theorem’s statement if and only if
BO's is at most the sum A ⁄ (k – 1 ) + B ⁄ ( k + 1) ≥ n ⁄ k
The latter holds by the lemma. worst-case number of BO's that get performed.
Subcase 2.2: the odd number k is of the form 1 + When the lists are of rather disparate
2p. Then lengths, the actual worst-case number of BO's
performed can be considerably less than the
log ( k + 1 ) = log k = 1 + log ( k – 1 ) theorem's bound. An extreme example is illustra-
log ( k – 1 ) tive. Let k = 3, so that the theorem's bound is (5/
2 = k–1
3)n − 2. If three lists have respective lengths n−2,
2 log k = 2 log ( k + 1 ) = 2( k – 1 ) 1, 1 then D&C-Merge groups them as indicated by
the parenthesization (n−2, (1, 1)) and so will
so *E* becomes
perform at most 0 + 1 + (n−1) = n BO's in merging
A ( log k – 1 ) + B log k them, not (5/3) n − 2, so the theorem's bound is
B about 66% too big, for these three lengths.
– A – ------------ 2 log k
k+1 The preceding paragraph should not cause
+n−k+1 discouragement about the theorem's bound. The
theorem is to be thought of as quantified over all
which simplifies to sets of k lists whose lengths sum to n. There are
n log k + n – k + 1 sets whose lengths are nearly equal (to n/k) and for
B such sets the theorem's bound is quite near the
– 2A – ------------ 2 ( k – 1 ) actual worst-case number of BO's performed. For
k+1
example, if k = 9 and n = 9005 (which is halfway
which is less than or equal to formula (1) if and between two multiples of 9) then for the following
only if list lengths (parenthesized to mirror how recursion
B n n groups the lists),
2A + ------------ 2 ( k – 1 ) ≥ --- 2 log k = --- 2 ( k – 1 )
k+1 k k ( ((1001 1001) (1001 1001))
that is, if and only if ((1001 1000) (1000 (1000 1000))) )
A B n the worst-case number of BO's is actually 29007,
----------- + ------------ ≥ ---
k–1 k+1 k whereas the theorem's bound is 29008.11. Since
costs as we compute them are integers, this same
which once again the lemma tells us holds. example shows that floor-ing expression (1)
improves the bound but does not in every case
Notes: calculate exactly the worst-case number of BO's
(1) Examining the above proof and the lemma, performed by D&C-Merge.
it follows that if the k lists all have the same length,
then the worst-case number of BO's performed by (3) If k is a power of 2 then the theorem's bound
Divide-and-Conquer-Merge exactly equals simplifies to
n log k – ( n ⁄ k )2 log k +n–k+1 n log k − k + 1.
(This expression is an integer, since n is a multiple
of k.) Thus our bound is tight, in the sense that it is (4) If the k source lists are not initially arranged
achieved, for infinitely many n (namely, all the in descending order of length, then a one-time up-
multiples of k). Of course, the previous sentence is front cost of ≈ k log k will make them so, and the
true for all k. remaining cost of merging them is as stated in the
theorem. Thus total cost ≈ (n + k) log k, which
(2) The proof and lemma also show that if the ≈ n log k for typical k and n.
k lists do not all have the same length, then the
theorem's bound is strictly greater than the actual
Optimality of Halving common list length is n/k), what we must show is
that the cost of merging m lists, then another k−m
Our algorithm Divide-and-Conquer-Merge plays lists, followed by a trailing MERGE, that is, cost
the divide-and-conquer game by halving the
mn
-------
number of lists to be merged. Intuition suggests k
mn mn
that halving is the best way of dividing up the lists. ------- log m – ------------- 2 log m + ------- – m + 1
k m k
Indeed, this is so, at least in the sense we now
describe. (--------------------
k – m )n
-
( k – m )n k
Call an algorithm for merging k sorted lists a + --------------------- log ( k – m ) – --------------------------2 log ( k – m )
D&CM-Algorithm if it takes the form k ( k – m)
if k = 1 then output = input ( k – m )n
+ --------------------- – ( k – m ) + 1
elsif k = 2 then MERGE k
else +n−1
partition the k lists into subsets, say, j of is greater than or equal to formula (2). After
them, 1 < j < k; simplification, what we must show is that, for any
recurse on each of the j subsets; m in the set {1, 2, 3, ..., k−1},
recurse on the j results;
end if; m log m – 2 log m