F8 PDF
F8 PDF
Lars Karlsson
2013-02-14
1
Overview
2
Part I
3
Bubble sort
I Applicability: General
I Idea: Sweep through list, compare and swap adjacent
pairs. Continue until no more swaps are made.
I Complexity: O(n2 ) in the worst case
4
Bubble sort
bool did_swap ;
do {
did_swap = false ;
for ( int k = 0; k + 1 < n ; ++ k ) {
if ( a [ k ] > a [ k + 1 ] ) {
swap ( & a [ k ] , & a [ k + 1 ] ) ;
did_swap = true ;
}
}
} while ( did_swap ) ;
5
Parallel(?) bubble sort
I Applicability: General
I Idea: Alternatingly compare and swap all the odd and
even adjacent pairs in parallel. Continue until no more
swaps are made.
I Complexity: O(n2 ) in the worst case
7
Odd-even transposition sort
int pty = 0;
bool did_swap ;
do {
did_swap = false ;
for ( int k = pty ; k + 1 < n ; k += 2 ) {
if ( a [ k ] > a [ k + 1 ] ) {
swap ( & a [ k ] , & a [ k + 1 ] ) ;
did_swap = true ;
}
}
pty = 1 - pty ;
} while ( did_swap ) ;
8
Odd-even transposition sort: OpenMP
int pty = 0;
bool did_swap ;
do {
did_swap = false ;
# pragma omp parallel for \
reduction (||: did_swap )
for ( int k = pty ; k + 1 < n ; k += 2 ) {
if ( a [ k ] > a [ k + 1 ] ) {
swap ( & a [ k ] , & a [ k + 1 ] ) ;
did_swap = true ;
}
}
pty = 1 - pty ;
} while ( did_swap ) ;
9
Odd-even transposition sort: Naive approach
I Idea:
I Block distribute the array
I Perform each inner loop in parallel
I Communication to complete cross-border swaps
I Disadvantages:
I Many synchronization points
Odd-even transposition sort: Hybrid approach
13
Merge
void
merge ( int n , int a [] , int b [] , int c [] )
{
int ka = 0 , kb = 0 , kc = 0;
while ( ka < n && kb < n ) {
if ( a [ ka ] < b [ kb ] )
c [ kc ++ ] = a [ ka ++ ];
else
c [ kc ++ ] = b [ kb ++ ];
}
while ( ka < n )
c [ kc ++ ] = a [ ka ++ ];
while ( kb < n )
c [ kc ++ ] = b [ kb ++ ];
}
14
Merge but keep only lower half
void
merge ( int n , int a [] , int b [] , int c [] )
{
int ka = 0 , kb = 0 , kc = 0;
while ( kc < n ) {
if ( a [ ka ] < b [ kb ] )
c [ kc ++ ] = a [ ka ++ ];
else
c [ kc ++ ] = b [ kb ++ ];
}
}
15
Merge but keep only upper half
void
merge ( int n , int a [] , int b [] , int c [] )
{
int ka = n - 1 , kb = n - 1 , kc = n - 1;
while ( kc >= 0 ) {
if ( a [ ka ] > b [ kb ] )
c [ kc - - ] = a [ ka - - ];
else
c [ kc - - ] = b [ kb - - ];
}
}
16
Part II
17
Parallel merge sort
20
Counting sort
22
Parallel counting sort
Sources of parallelism:
I Counting
I Parition the input array
I Count occurences with privatized counts
I Reduce privatized counts to global counts
23
Parallel counting sort
Sources of parallelism:
I Counting
I Parition the input array
I Count occurences with privatized counts
I Reduce privatized counts to global counts
I Construction
I Partition the key values
I How to find the start of each block?
I Parallelize for each key
23
Partitioned keys
c0 + c1 + . . . + ck−1
24
Partitioned keys
Algorithm sketch:
1. Local counting of occurences of each key
2. Parallel reduction to obtain global counts
3. Prefix summation of counts to obtain block offsets
4. Parallel construction using global counts and block offsets
Part IV
26
Radix sort
28
Parallel radix sort
Overview:
1. Count occurences of each digit/process pair
I cnt[ q ][ d ]: Occurences of digit d on process q
I Local computation
2. Find global block offset for each digit/process pair
I off[ q ][ d ]: Block offset for digit d on proc q
I Prefix sum of cnt in column-major order
3. Find destination index for each input element
I indx[ q ][ k ]: Destination for element k on q
I (How?)
4. Apply the input-to-output permutation
I output[ indx[ k ] ] = input[ k ]
I Permutation / all-to-all
Parallel radix sort: Example
Input
101, 010, 000, 110 || 011, 001, 100, 111
" #
3 1
cnt =
1 3
" #
0 4
off =
3 5
" #
4 0 1 2
indx =
5 6 3 7
Output
30
The End!
31