0% found this document useful (0 votes)
83 views9 pages

K Way Merge N Sort ACM SE Regl 1993

The document discusses algorithms for k-way merging of sorted lists and k-ary sorting. It presents a divide-and-conquer algorithm for k-way merging that recursively merges the first k/2 lists and last k/2 lists, then merges the results. This algorithm performs the fewest comparisons of any similar divide-and-conquer approach. It also analyzes the cost of a k-ary sorting algorithm that divides an input list into k sublists and sometimes has the same cost as binary sorting. Finally, it briefly considers parallelizing these algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views9 pages

K Way Merge N Sort ACM SE Regl 1993

The document discusses algorithms for k-way merging of sorted lists and k-ary sorting. It presents a divide-and-conquer algorithm for k-way merging that recursively merges the first k/2 lists and last k/2 lists, then merges the results. This algorithm performs the fewest comparisons of any similar divide-and-conquer approach. It also analyzes the cost of a k-ary sorting algorithm that divides an input list into k sublists and sometimes has the same cost as binary sorting. Finally, it briefly considers parallelizing these algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

k-way merging and k-ary sorts

William A. Greene
Computer Science Department
University of New Orleans
New Orleans, LA 70148
wagcs@uno.edu

Abstract: We present a divide-and-conquer algo- topic of k-way merging has been considered
rithm for merging k sorted lists, namely, recur- before, but only lightly and in the context of
sively merge the first k ⁄ 2 lists, do likewise for external sorting ([1], [10]).
the last k ⁄ 2 lists, then merge the two results. We We present a simple divide-and-conquer algorithm
get a tight bound for the expense, in comparisons for k-way merging. The algorithm resembles the
made between list elements, of this merge. We merge sort itself: first recursively merge the first
show the algorithm is cheapest among all similar k ⁄ 2 lists, then do likewise for the last k ⁄ 2
divide-and-conquer approaches to k-way merging. lists, and finally merge the two results. We obtain
We compute the expense of the k-ary sort, which, a good tight bound on the number of comparisons
in analogy to the binary sort, divides its input list between list elements made by our divide-and-
into k sublists. Sometimes the k-ary sort has the conquer k-way merge algorithm. We show that the
same expense as the binary sort. Finally we briefly algorithm does the fewest comparisons among all
consider parallelizing these algorithms. similar divide-and-conquer approaches to k-way
merging. We compute the cost (in comparisons) of
Introduction the k-ary sort, which generalizes the binary sort by
dividing its input list into (not 2 but) k approxi-
Knuth tells us [10, p. 161] that the merge sort -- mately equal-sized sublists; sometimes the cost of
here we shall call it the binary sort -- was one of the the k-ary sort is identical to that of the binary sort.
first sorting algorithms suggested for computer Finally we briefly consider parallelizing our algo-
implementation. There has been continuing rithms.
interest in speeding up the merge operation (for
In this paper we will always sort lists into
instance, Sprugnoli [12], Carlsson [3], Thanh et al.
ascending (versus descending) order. Lists are
[13], Dudzinski and Dydek [5], Brown and Tarjan
assumed to be sequentially implemented (versus a
[2], Trabb Pardo [14], Hwang and Lin [9]) and also
linked implementation). The floor x and ceiling
interest in the speed-ups possible through paral-
x functions have their usual meanings: for a real
lelism (Cole [4], Shiloach and Vishkin [11],
number x,
Hirschberg [8], Gavril [7], Even [6]). In all the
references listed, the authors have considered the x = greatest integer i such that i ≤ x,
case of merging two sorted lists. In this paper we x = least integer i such that i ≥ x.
shall study the merging of k ≥ 2 sorted lists. The
“log” will mean base two logarithm log2; a loga-
rithm to some other base b will be explicitly
subscripted logb. When measuring the run-times of
algorithms we will consider worst-case run-times.
The binary sort achieves its good runtime by a
divide-and-conquer strategy, namely, that of (1) f( 1) = 0
halving the list being sorted: the front and back (2) f(n) = n – 1 + f( n ⁄ 2 ) + f( n ⁄ 2 )
halves of the list are recursively sorted separately,
then the two results are merged into the answer list. When n is a power of 2, the second equation
An implementation is becomes
(2′) f(n) = n − 1 + 2 f(n/2)
procedure BINARY_SORT(L: in/out List_type);
which has solution
local L1, L2: List_type;
f(n) = n log2n − n +1 , n a power of 2.
begin
if length(L) > 1 then The general solution is
L1 := L(1 .. n ⁄ 2 ); f ( n ) = n log n – 2 log n +1
L2 := L( n ⁄ 2 + 1 .. n); for arbitrary integers n ≥ 1.
BINARY_SORT(L1); [1, Ex. 9.12]. In particular, binary sort's runtime is
BINARY_SORT(L2); O(n log n).
MERGE(L1, L2, L);
end if; k-way merging
end;
Now let us generalize to an integer k ≥ 2. Let L be
The procedure MERGE( L1, L2: in List_type; a list of n elements. Divide L into k disjoint contig-
L: out List_type ) that we have in mind for sorting uous sublists L1, L2, ..., Lk of nearly equal length.
two lists is described as follows. Initialize pointers Some Li's (namely, n rem k of them, so possibly
to the first item in each list L1, L2, and then none) will have length n ⁄ k + 1 -- for reasons
repeat that will become clear later, let these have the low
compare the two items pointed at; indices: L1, L2, ... . Other Li's will have length
n ⁄ k , and are to have high indices: ..., Lk−1, Lk.
move the smaller into L;
advance the corresponding pointer to the We intend to recursively sort the Li's and then
smaller's neighbor; merge the k results into an answer list. The
expense of our k-ary sort is completely determined
until one of L1,L2 exhausts;
by the cost of merging k sorted lists. Here are three
drain the remainder of the unexhausted alternative algorithms for merging k sorted lists.
list into L; Note below that we do not assume the source lists
The basic operation (BO) we shall be counting to have approximately equal lengths.
obtain the (worst-case) cost of algorithms will be
that of comparing two list elements. If the two (1) Linear-Search-Merge: Find the smallest of k
input lists to MERGE have lengths m and n respec- items (one from each of the k sorted source lists), at
tively, then MERGE does at most m + n − 1 BO's a cost of k−1 BO's. Move the smallest into the
(the draining does comparisons, but not of list answer list and replace it by its neighbor (the next
elements). (MERGE does at least min{m, n} largest element) in the source list from which it
BO's.) If |m − n| ≤ 1 then algorithm MERGE does came. Again there are k items, from among which
as few BO's as any algorithm for merging two the smallest is to be selected. (When a list
sorted lists [Knuth, Theorem 5.3.2−M]. exhausts, the last moved item has no replacement,
so next we find the smallest of fewer than k items.)
If f(n) = the (worst-case) number of BO's done by
BINARY_SORT in sorting a list of length n, then f (2) Heap-Merge: k items (one from each sorted
satisfies source list) are maintained in a heap (under disci-
pline: root = smallest). Move the smallest item
into the answer list, replace the moved item by its When the k source lists all exhaust at nearly the
neighbor in the source list from which it came, and same time, Linear-Search-Merge performs slightly
then, with cost 2 log k BO's, re-heapify. (When a fewer than (k−1)n BO's. (The exact worst-case
list exhausts, the last moved item is not replaced, number of BO's made by Linear-Search-Merge is
and we re-heapify to a heap with fewer than k (k−1) (n − k/2), and it is a pleasant induction argu-
items.) ment on k to show this.)
(3) Divide-and-Conquer-Merge: recursively When the k source lists all exhaust at nearly the
merge the first k ⁄ 2 lists, recursively merge the same time, Heap-Merge performs approximately
last k ⁄ 2 lists, then MERGE the two results. (If 2n log k BO's.
k = 2 then just MERGE; if k = 1 then output =
input.)
Now we shall show that, if the k lists are presented
in decreasing order of length, then D&C-Merge
We shall show that Divide-and-Conquer-Merge performs at most
performs the fewest BO's among these three alter-
natives for doing k-way merging. n log k – ( n ⁄ k )2 log k +n–k+1
BO's, which is always ≤ log k BO's, so about
half as many BO's as Heap-Merge. In fairness, the
The problem of k-way merging has been studied
actual runtimes of D&C-Merge can be expected to
before, in the context of external sorting. See [1,
approximate and perhaps exceed the runtimes of
Chapter 11, especially pages 354-355]; also see
Heap-Merge. Both have other expenses besides
[10, especially section 5.4.1, pages 251-253]. In
BO's and in particular D&C-Merge has recursion
the cited references the authors in passing generally
expenses, though these can be reduced by using a
assume that Heap-Merge is used to merge k lists
stack variable to simulate recursive procedure
(we might call them short-ish) stored in central
calls. We shall re-compare Heap-Merge and
memory. But this is a minor interest to the authors,
Divide-and-Conquer-Merge in the section on
for they are mostly concerned with those problems
parallelism.
peculiar to extenal sorting, namely, minimizing
accesses of external memory such as tapes, which
amounts to the judicious building and arranging of Now to bound D&C-Merge's expense. First we
"runs" (sets of adjacent records that are in sorted need a lemma. Below, function “len” is the length
order) on k very long tapes. function.
For external sorting, Heap-Merge is the sensible
choice and Divide-and-Conquer-Merge is not.
Heap-Merge makes one sequential pass through Lemma: Let L1, L2, ..., Lk be lists, where k is odd,
each of its k source lists; for external sorting this is and suppose
appropriate. The recursive algorithm Divide-and- len(L1) ≥ len(L2) ≥ ... ≥ len(Lk).
Conquer-Merge revisits its input records; for Denote j = k ⁄ 2 (which here is (k−1)/2),
external sorting this has the undesirable effect of
increasing the number of accesses of external A = len(L1) + len(L2) + ... + len(Lj),
memory (and unless tapes can be read backwards, B = len(Lj+1) + len(Lj+2) + ... + len(Lk),
also the number of tape rewinds will increase).
n = A + B.
Notation: Let n be the sum of the lengths of the k Then
source lists. Also, D&C-Merge abbreviates the (1) B−A ≤ n/k ,
name Divide-and-Conquer-Merge.
(2) B−A = n/k if and only if all the lists have
k A log ( k ⁄ 2 ) k
the same length, A log --- – --------- 2 + A – --- + 1
2 k⁄2 2
(3) A/(k−1) + B/(k+1) ≥ n/k, with equality
holding if and only if all the lists have the same k B log ( k ⁄ 2 ) k
+ B log --- – --------- 2 + B – --- + 1
length. 2 k⁄2 2
Proof: +n−1

B−A = len(Lk) − [len(L1) − len(Lk−1)] where the first two lines are the costs of recursion
and the third line is the cost of MERGE. Next using
− [len(L2) − len(Lk−2)] − ...
the relations
− [len(Lj) − len(Lj+1)]
A + B = n,
≤ len(Lk)
log ( k ⁄ 2 ) = log k – 1 = log k – 1 ,
-- since len(Li) − len(Lk−i) ≥ 0, for i ≤ j
our sum easily simplifies to expression (1).
≤ n/k
--since the shortest list has length ≤ Case 2: k is odd. Then k ⁄ 2 = (k−1)/2, k ⁄ 2
average length. = (k+1)/2, and by induction the number of BO's is
This gives (1), and implies (2). Part (3) follows at most
from (1) and (2). k–1
k–1 A log ----------- k–1
A log ----------- – -----------2 2 + A – ----------- + 1
2 k–1 2
Theorem 1: Let L1, L2, ..., Lk be sorted lists that -----------
2
satisfy
len(L1) ≥ len(L2) ≥ ... ≥ len(Lk). k+1
log ------------
k+1 B 2 k+1
+ B log ------------ – ------------ 2 + B – ------------ + 1
Let n be the sum of their lengths. Then Divide- 2 k+1 2
and-Conquer-Merge performs at most ------------
2
(1) n log k – ( n ⁄ k )2 log k +n–k+1 +n−1
BO's in merging these lists. which simplifies to expression *E* =
Proof: The proof is by induction on k. When k = A log ( k – 1 ) + B log ( k + 1 )
2, formula (1) becomes n−1, which is correct for A log ( k – 1 ) B log ( k + 1 )
the (maximum) number of BO's performed by – ----------- 2 – ------------ 2
k–1 k+1
MERGE when it merges two lists whose lengths
sum to n. Now assume the desired bound holds +n−k+1
whenever h < k and D&C-Merge merges h lists
(whose lengths descend). Denoting j = k ⁄ 2 , we Subcase 2.1: the odd number k is not of the form
note that list set L1, L2, ..., Lj, and list set Lj+1, 1 + 2p (for some positive integer p). Then
Lj+2, ..., Lk are also in descending order of length. log ( k – 1 ) = log k = log ( k + 1 )
Let so formula *E* simplifies to
A = the sum of the lengths of the first k ⁄ 2 lists,
n log k + n – k + 1
B = the sum of the lengths of the last k ⁄ 2 lists.
There will be two cases, one of which has two – [ A ⁄ ( k – 1 ) + B ⁄ ( k + 1 ) ]2 log k
subcases.
which is less than or equal to formula (1) of the
Case 1: k is even. By induction the number of Theorem’s statement if and only if
BO's is at most the sum A ⁄ (k – 1 ) + B ⁄ ( k + 1) ≥ n ⁄ k
The latter holds by the lemma. worst-case number of BO's that get performed.
Subcase 2.2: the odd number k is of the form 1 + When the lists are of rather disparate
2p. Then lengths, the actual worst-case number of BO's
performed can be considerably less than the
log ( k + 1 ) = log k = 1 + log ( k – 1 ) theorem's bound. An extreme example is illustra-
log ( k – 1 ) tive. Let k = 3, so that the theorem's bound is (5/
2 = k–1
3)n − 2. If three lists have respective lengths n−2,
2 log k = 2 log ( k + 1 ) = 2( k – 1 ) 1, 1 then D&C-Merge groups them as indicated by
the parenthesization (n−2, (1, 1)) and so will
so *E* becomes
perform at most 0 + 1 + (n−1) = n BO's in merging
A ( log k – 1 ) + B log k them, not (5/3) n − 2, so the theorem's bound is
B about 66% too big, for these three lengths.
– A – ------------ 2 log k
k+1 The preceding paragraph should not cause
+n−k+1 discouragement about the theorem's bound. The
theorem is to be thought of as quantified over all
which simplifies to sets of k lists whose lengths sum to n. There are
n log k + n – k + 1 sets whose lengths are nearly equal (to n/k) and for
B such sets the theorem's bound is quite near the
– 2A – ------------ 2 ( k – 1 ) actual worst-case number of BO's performed. For
k+1
example, if k = 9 and n = 9005 (which is halfway
which is less than or equal to formula (1) if and between two multiples of 9) then for the following
only if list lengths (parenthesized to mirror how recursion
B n n groups the lists),
2A + ------------ 2 ( k – 1 ) ≥ --- 2 log k = --- 2 ( k – 1 )
k+1 k k ( ((1001 1001) (1001 1001))
that is, if and only if ((1001 1000) (1000 (1000 1000))) )
A B n the worst-case number of BO's is actually 29007,
----------- + ------------ ≥ ---
k–1 k+1 k whereas the theorem's bound is 29008.11. Since
costs as we compute them are integers, this same
which once again the lemma tells us holds. example shows that floor-ing expression (1)
improves the bound but does not in every case
Notes: calculate exactly the worst-case number of BO's
(1) Examining the above proof and the lemma, performed by D&C-Merge.
it follows that if the k lists all have the same length,
then the worst-case number of BO's performed by (3) If k is a power of 2 then the theorem's bound
Divide-and-Conquer-Merge exactly equals simplifies to
n log k – ( n ⁄ k )2 log k +n–k+1 n log k − k + 1.
(This expression is an integer, since n is a multiple
of k.) Thus our bound is tight, in the sense that it is (4) If the k source lists are not initially arranged
achieved, for infinitely many n (namely, all the in descending order of length, then a one-time up-
multiples of k). Of course, the previous sentence is front cost of ≈ k log k will make them so, and the
true for all k. remaining cost of merging them is as stated in the
theorem. Thus total cost ≈ (n + k) log k, which
(2) The proof and lemma also show that if the ≈ n log k for typical k and n.
k lists do not all have the same length, then the
theorem's bound is strictly greater than the actual
Optimality of Halving common list length is n/k), what we must show is
that the cost of merging m lists, then another k−m
Our algorithm Divide-and-Conquer-Merge plays lists, followed by a trailing MERGE, that is, cost
the divide-and-conquer game by halving the
 mn
-------
number of lists to be merged. Intuition suggests  k
mn mn
that halving is the best way of dividing up the lists. ------- log m – ------------- 2 log m + ------- – m + 1
k m k
Indeed, this is so, at least in the sense we now
describe.  (--------------------
k – m )n
-
( k – m )n  k 
Call an algorithm for merging k sorted lists a + --------------------- log ( k – m ) – --------------------------2 log ( k – m )
D&CM-Algorithm if it takes the form k ( k – m)
if k = 1 then output = input ( k – m )n
+ --------------------- – ( k – m ) + 1
elsif k = 2 then MERGE k
else +n−1
partition the k lists into subsets, say, j of is greater than or equal to formula (2). After
them, 1 < j < k; simplification, what we must show is that, for any
recurse on each of the j subsets; m in the set {1, 2, 3, ..., k−1},
recurse on the j results;
end if; m log m – 2 log m

An example of partitioning k = 9 lists into j = 3 + ( k – m ) log ( k – m ) – 2 log ( k – m )


subsets is given by ( (L1, L2, L3), (L4, L5, L6, L7),
(L8, L9) ). We do not insist that the number j of ≥ k log k – 2 log k –k
subsets is the same on every call. On the non-
recursive level, what is happening is that such an By symmetry, it suffices to demonstrate this for any
algorithm is doing a sequence of MERGE's, the last m∈{1, 2, ..., k ⁄ 2 }. We reason as follows.
of which is the MERGE of two lists L*1 and L*2, For real numbers x ≥ 1, define f(x) =
where L*1 (resp., L*2) is obtained from the x log x – 2 log x . Function f consists of linear
merging of m (resp., k−m) of the k source lists by
pieces; for instance,
some D&CM-Algorithm. The next proposition
shows that partitioning into halves is optimal, when on interval (2p−1, 2p], f(x) = p x − 2p,
merging lists which all have the same length. on interval (2p, 2p+1], f(x) = (p+1) x − 2p+1.
Moreover, since p2p − 2p equals the limit, as x
Theorem 2: To merge k sorted lists which all have approaches 2p from the right, of (p+1) x − 2p+1,
the same length and whose lengths sum to n, a we also conclude function f is continuous (in the
D&CM-Algorithm must do at least mathematical sense). Thus the graph of f can be
described as: a straight line segment of slope 1,
(2) n log k – ( n ⁄ k )2 log k +n–k+1 connected to a straight line segment of slope 2,
BO's in the worst-case. connected to a straight line segment of slope 3,
connected to ... and so on. Consequently, the
difference between the values of f at two consec-
Proof: We induct on k. When k = 2, formula (2) utive integers, f(m) − f(m−1), can be calculated as
gives MERGE's familiar bound. Now assume the soon as we know which interval (2q−1, 2q]
desired result holds whenever 0 < m < k and m contains m, for then the difference f(m) − f(m−1)
lists all of the same length are merged by a D&CM- must equal the slope, which is q.
Algorithm. From the paragraph preceding the
statement of this theorem (and noting that the Recall the integer k of this proposition. Let p be
the integer that satisfies 2p < k ≤ 2p+1. For integers imagine: invoking D&C-Merge to merge the
m∈{1, 2, ..., k ⁄ 2 } define g(m) = f(m) + f(k−m) subset of m lists, invoking D&C-Merge to merge
and note that for such m, the subset of k−m lists, then MERGE-ing the two
m ≤ k/2 ≤ 2p implies: f(m) − f(m−1) ≤ p, results. The (worst-case) number of BO's so
performed must exactly equal the number
k−m ≥ k/2 > 2p−1 implies: f(k−m+1)−f(k−m) ≥ p. performed when D&C-Merge is called to merge 24
Then g(m−1) − g(m) lists. That equality must hold follows from this
= f(m−1) + f(k−m+1) − f(m) − f(k−m) note's first paragraph and from examining the prop-
osition's proof.
= (f(k−m+1) − f(k−m)) − (f(m) − f(m−1))
≥ p − p = 0. The k-ary sort
That is, g is a decreasing function of its domain {1,
Now let us return to the k-ary sort, which divides its
2, ..., k ⁄ 2 }. It is straightforward to verify that
unsorted input list into k sublists of nearly equal
g's least value g( k ⁄ 2 ) equals
length and makes k recursive calls, followed by a
k log k – 2 log k –k call of Divide-and-Conquer-Merge. If f(n) =
worst-case number of BO's performed by the k-ary
(again there are the three cases: k even, k odd and sort when sorting a list of length n, then f satisfies
of form 1 + 2p, k odd but not of form 1 + 2p). We
have shown (1) f(1) = 0
f(m) + f(k−m) = g(m) ≥ k log k – 2 log k – k (2) f(n) ≤ n log k – ( n ⁄ k )2 log k +n−k+1
for all m∈{1, 2, 3, ..., k ⁄ 2 }. This was precisely + (n rem k) f( n ⁄ k ) + (k − (n rem k)) f( n ⁄ k )
our goal.
When n is a power of k, inequality (2) is replaced
by equality
Note: Nature is capable of remarkable economies!
In the notation of the proposition, k/2∈(2p−1, 2p], (2′) f(n) =
and m and k−m lie on either side of k/2. If m, n log k – ( n ⁄ k )2 log k + n – k + 1 + kf ( n ⁄ k )
k−m both fall into interval (2p−1, 2p] then it is
easily shown that g(m) = f(m) + f(k−m) will equal which has solution
g's least value g( k ⁄ 2 ). By the continuity of f, the f(n) = nlogkn log k − (n/k)logkn 2 log k
same can be said even if m is the stranded endpoint
+ nlogkn − n + 1
2p−1.
as an induction argument (on n = powers of k) will
We might express these matters by saying that
show.
halving is optimal but other partitions can achieve
equally good results. For instance: recall the algo- If n is a power of k and k is a power of 2 we get
rithm D&C-Merge = Divide-and-Conquer-Merge f(n) = (nlogkn) log2k − n + 1
(the halver). Let k = 24 ( = 3 times a power of 2,
= nlog2n − n + 1.
so, halfway between two powers of 2). If 24 lists
(of equal length) are partitioned into two subsets, Thus, for example, octary sort (k = 8) performs
the sizes of the subsets are a pair of numbers that exactly the same number of BO's as binary sort
sum to 24. Five such pairs are when sorting lists of length 23m. On reflection, this
is not altogether surprising.
(8,16), (9,15), (10,14), (11,13), (12,12).
In general, the k-ary sort has runtime O(nlog n).
(D&C-Merge -- the halver -- would use the last
pair.) For any one of these five pairs (m, k−m),
Parallelism short-cuts when lists exhaust early, but the worst-
case expense is n+3 cycles.) A cycle is hardly
Heap-Merge does not improve in the presence of different from a BO as defined earlier. The expense
parallelism (that is, a multiplicity of processing n+3 on the parallel machine should be contrasted
units operating simultaneously). The recurring with the theorem's expense of ≈ n log216 = 4n on
expense in Heap-Merge is re-heapifying, and re- a uni-processor machine -- a four-fold speed-up.
heapifying is inherently sequential; it cannot be
parallelized. Now suppose on our 15-processor machine we
have to sort a list L of length n. We do so with a 16-
On the other hand, Divide-and-Conquer-Merge and ary sort:
the k-ary sort can easily be parallelized and thereby
sped up, which we now briefly investigate. Using divide the list into sixteenths;
more elaborate algorithms, others have achieved make sixteen recursive calls, one for
faster runtimes than we shall. The algorithm of each sixteenth;
Shiloach and Vishkin in section 4.1 of [11] has merge, using the parallel merge algorithm;
some outward similarity to our k-ary sort, but their Each recursive call will also perform a 16-way
sorter is not recursive and uses a different merging merge, so will occupy all 15 processors, therefore
routine. Their runtime is O( (n/k) log n ) where k = the 16 recursive calls are to be done sequentially.
the number of available processors; our runtime Let f(n) = (worst-case) number of cycles required
will be O( (n/log k) log n ). Cole's very compli- to sort L. For n's that are powers of 16,
cated "cascading" merge [4] achieves a runtime of f(1) = 0,
O(log n) if there are as many processors as there are
list elements to be sorted. f(n) = n + 3 + 16 f(n/16)
So now let us consider the case that there are 16 which has solution
lists. Ultimately, Divide-and-Conquer-Merge's f(n) = n log16 n + (n−1)/5
behavior is to MERGE these by pairs, MERGE the = (1/4) n log2n + (n−1)/5
8 results by pairs, MERGE those 4 results by pairs,
etc. Obviously the 8 incarnations of MERGE on or 4 times faster than binary sort on a uni-processor
the lowest level can run in parallel, and similarly machine. For a parallel machine with 2p−1 proces-
for higher levels. Actually, we can do even better sors, the measurements are: parallel merge
by starting the merging on level m just one tick completes after n+p−1 cycles; sorting is p times
after starting that on level m+1. faster than on a uni-processor machine.
Suppose there are 15 processors, arranged in a full If there are as many processors as there are list
binary tree, in the sense that output from a child elements to be sorted, then sorting can become
processor is input to its parent. The processors we merging where leaf processors start with a pair of
have in mind are quite simple. Each compares two singleton lists; then sorting completes after n+log n
input records from its memory and outputs the cycles. This scenario is overly generous in its use
smaller into its parent's memory; call that unit of of processors; for instance, after one cycle the leaf-
activity a cycle. Each of the 8 leaf processors level processors (half of the total) have no more
begins with input consisting of two sorted lists. Let work to do and could be reallocated to elsewhere in
n be the sum of the lengths of these 16 lists. On the the tree.
fourth cycle the root outputs for the first time
(outputting, of course, the smallest element among
the 16 lists). On each succeeding cycle the root
outputs one more element. After n+3 cycles the 16
lists will have been merged. (One can conceive of
Summary and Conclusion 4. Cole, Richard, "Parallel merge sort", SIAM J.
Computing 17 (1988), 770-785.
Our original interest was in the k-ary sort, which is
the generalization of the binary sort to the case of 5. Dudzinski, Krzysztof, and Dydek, Andrzej, "On
dividing a source list into (not 2 but) k sublists. All a stable minimum storage merging algorithm",
the essential expense of the k-ary sort comes from Information Proc. Lett. 12 (1981), 5-8.
the merging operation. Thus arose our curiosity 6. Even, Shimon, "Parallelism in tape-sorting",
about ways to do a k-way merging of k sorted lists. Communications Assoc. Comput. Mach. 17 (1974),
202-204.
A strategy we named Divide-and-Conquer-Merge
was presented, a tight bound was found for its 7. Gavril, Fanica, "Merging with parallel proces-
expense, and it was shown less costly than two sors", Communications Assoc. Comput. Mach. 18
other strategies for k-way merging (Linear-Search- (1975), 588-591.
Merge, Heap-Merge). Our algorithm Divide-and- 8. Hirschberg, D. S., "Fast parallel sorting algo-
Conquer-Merge, whose scheme is to recurse on rithms", Communications Assoc. Comput. Mach.
halves of the numbers of source lists being merged, 21 (1978), 657-661.
was additionally shown optimal among a class of 9. Hwang, F. K., and Lin, S., "A simple algorithm
similar approaches that recurse on subgroups of the for merging two disjoint linearly ordered sets",
source lists. The expense of the k-ary sort was SIAM J. Computing 1 (1972), 31-39.
analyzed to be O(n logn); sometimes its expense
exactly equals that of the binary sort. We briefly 10. Knuth, Donald E., The Art of Computer
explored parallel implementations of our merging Programming, Vol. 3: Sorting and Searching,
and sorting approaches, and their costs. Addison-Wesley, Reading, Mass. 1973.
11. Shiloach, Yossi, and Vishkin, Uzi, "Finding the
We do not expect to see actual use of the k-ary sort, maximum, merging, and sorting in a parallel
since simpler approaches such as the binary sort are computation model", J. Algorithms 2 (1981), 88-
no costlier. K-way merging may see application. 102.
The mathematical techniques used in our cost anal-
yses are, to our knowledge, entirely novel and, in 12. Sprugnoli, Renzo, "The analysis of a simple in-
our opinion, intellectually stimulating and estheti- place merging algorithm", J. Algorithms 10 (1989),
cally appealing. As with certain other instances we 366-380.
might cite in complexity analysis, the proofs are as 13. Thanh, Mai; Alagar, V. S.; and Bui, T. D.,
intriguing as the statements of the theorems. "Optimal expected time algorithms for merging", J.
Algorithms 7 (1986), 341-357.
Bibliography 14. Trabb Pardo, Luis, "Stable sorting and merging
with optimal space and time", SIAM J. Computing
1. Aho, Alfred A., Hopcroft, John E., and Ullman, 6 (1977), 351-372.
Jeffrey D., Data Structures and Algorithms,
Addison-Wesley, Reading, Mass., 1983.
2. Brown, Mark R., and Tarjan, Robert E., "A fast
merging algorithm", J. Assoc. Comput. Mach. 26
(1979), 211-226.
3. Carlsson, Svante, "Splitmerge - a fast stable
merging algorithm", Information Proc. Lett. 22
(1986), 189-192.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy