0% found this document useful (0 votes)
36 views32 pages

F8 PDF

This document summarizes four parallel sorting algorithms: odd-even transposition sort, parallel merge sort, counting sort, and radix sort. It provides details on how each algorithm works sequentially and how parallelism can be introduced. It also includes code examples and exercises related to implementing aspects of the sorting algorithms in parallel.

Uploaded by

rahulmnnit_cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views32 pages

F8 PDF

This document summarizes four parallel sorting algorithms: odd-even transposition sort, parallel merge sort, counting sort, and radix sort. It provides details on how each algorithm works sequentially and how parallelism can be introduced. It also includes code examples and exercises related to implementing aspects of the sorting algorithms in parallel.

Uploaded by

rahulmnnit_cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

F8: Parallel Sorting Algorithms

5DV011/VT13: Parallel Computer Systems

Lars Karlsson

2013-02-14

1
Overview

I Odd-even transposition sort


I Parallel merge sort
I Counting sort
I Radix sort

2
Part I

Odd-Even Transposition Sort

3
Bubble sort

I Applicability: General
I Idea: Sweep through list, compare and swap adjacent
pairs. Continue until no more swaps are made.
I Complexity: O(n2 ) in the worst case

4
Bubble sort

bool did_swap ;
do {
did_swap = false ;
for ( int k = 0; k + 1 < n ; ++ k ) {
if ( a [ k ] > a [ k + 1 ] ) {
swap ( & a [ k ] , & a [ k + 1 ] ) ;
did_swap = true ;
}
}
} while ( did_swap ) ;

5
Parallel(?) bubble sort

I Cannot parallelize the inner k-loop


I Possible loop-carried dependence
I Cannot parallelize the outer loop
I Loop-carried dependence
I But... we can pipeline!
I Start the next inner iteration before the current
completes
I Might have to speculatively start iterations
Odd-even transposition sort

I Applicability: General
I Idea: Alternatingly compare and swap all the odd and
even adjacent pairs in parallel. Continue until no more
swaps are made.
I Complexity: O(n2 ) in the worst case

7
Odd-even transposition sort

int pty = 0;
bool did_swap ;
do {
did_swap = false ;
for ( int k = pty ; k + 1 < n ; k += 2 ) {
if ( a [ k ] > a [ k + 1 ] ) {
swap ( & a [ k ] , & a [ k + 1 ] ) ;
did_swap = true ;
}
}
pty = 1 - pty ;
} while ( did_swap ) ;

8
Odd-even transposition sort: OpenMP
int pty = 0;
bool did_swap ;
do {
did_swap = false ;
# pragma omp parallel for \
reduction (||: did_swap )
for ( int k = pty ; k + 1 < n ; k += 2 ) {
if ( a [ k ] > a [ k + 1 ] ) {
swap ( & a [ k ] , & a [ k + 1 ] ) ;
did_swap = true ;
}
}
pty = 1 - pty ;
} while ( did_swap ) ;

9
Odd-even transposition sort: Naive approach

I Idea:
I Block distribute the array
I Perform each inner loop in parallel
I Communication to complete cross-border swaps
I Disadvantages:
I Many synchronization points
Odd-even transposition sort: Hybrid approach

I Idea: Perform initial local sort and perform block analog


of odd-even transposition sort to build the globally sorted
output.
I Analysis:
I (n/p) log(n/p) for local sort phase using optimal
sequential comparison sort algorithm
I p steps in the global phase
I p(ts + tw n/p) for block odd-even transposition sort
Block odd-even transposition step

I Two adjacent processes with sorted local arrays


I Goal: Rearrange the elements of the two arrays such that
the left process has the smallest half of the elements in
sorted order and the right process has the largest half.
I Exchange local arrays
I On left process:
I Merge and keep smallest half
I On right process:
I Merge and keep largest half
Exercise: Sequential merge

1. Write a sequential linear-time program that takes two


sorted arrays (a[] and b[]) of length n and merges them
into one length 2n array (c[]).
2. Write a version that keeps only the lower half.
3. Write a version that keeps only the upper half.

13
Merge
void
merge ( int n , int a [] , int b [] , int c [] )
{
int ka = 0 , kb = 0 , kc = 0;
while ( ka < n && kb < n ) {
if ( a [ ka ] < b [ kb ] )
c [ kc ++ ] = a [ ka ++ ];
else
c [ kc ++ ] = b [ kb ++ ];
}
while ( ka < n )
c [ kc ++ ] = a [ ka ++ ];
while ( kb < n )
c [ kc ++ ] = b [ kb ++ ];
}
14
Merge but keep only lower half

void
merge ( int n , int a [] , int b [] , int c [] )
{
int ka = 0 , kb = 0 , kc = 0;
while ( kc < n ) {
if ( a [ ka ] < b [ kb ] )
c [ kc ++ ] = a [ ka ++ ];
else
c [ kc ++ ] = b [ kb ++ ];
}
}

15
Merge but keep only upper half

void
merge ( int n , int a [] , int b [] , int c [] )
{
int ka = n - 1 , kb = n - 1 , kc = n - 1;
while ( kc >= 0 ) {
if ( a [ ka ] > b [ kb ] )
c [ kc - - ] = a [ ka - - ];
else
c [ kc - - ] = b [ kb - - ];
}
}

16
Part II

Parallel Merge Sort

17
Parallel merge sort

I The hybrid local/global approach can be applied to other


sorts
I The local phase remains the same
I The global phase is altered
I Parallel merge sort
I Local phase: Any local sort
I Global phase: Pairwise merging in binary tree
I log2 p steps (down from p in odd-even)
I But... the steps become increasingly expensive
Exercise: Parallel merge sort

1. Write a message-passing program that performs the


communications necessary for parallel merge sort
Part III

Parallel Counting Sort

20
Counting sort

I Applicability: When the keys are integers drawn from a


small range, such as 8-bit numbers, and therefore contain
many duplicates.
I Idea: Count the occurence of each key and construct the
output array from scratch using these counts.
I Complexity: O(n + k), where
I n is the input length
I k is the size of the key range
Counting sort
int counts [ 256 ] = { 0 };

// Count occurences of each key .


for ( int k = 0; k < n ; ++ k ) {
counts [ input [ k ] ] += 1;
}

// Construct output array from counts .


int r = 0;
for ( int v = 0; v < 256; ++ v ) {
for ( int k = 0; k < counts [ v ]; ++ k ) {
output [ r ++ ] = v;
}
}

22
Parallel counting sort

Sources of parallelism:
I Counting
I Parition the input array
I Count occurences with privatized counts
I Reduce privatized counts to global counts

23
Parallel counting sort

Sources of parallelism:
I Counting
I Parition the input array
I Count occurences with privatized counts
I Reduce privatized counts to global counts
I Construction
I Partition the key values
I How to find the start of each block?
I Parallelize for each key

23
Partitioned keys

I How to find the start of each block?


I ck : Number of occurences of key k
I Assume for simplicity: k ∈ {0, 1, . . . , 255}
I Start of block for key k:

c0 + c1 + . . . + ck−1

I Recognize a prefix sum

24
Partitioned keys

Algorithm sketch:
1. Local counting of occurences of each key
2. Parallel reduction to obtain global counts
3. Prefix summation of counts to obtain block offsets
4. Parallel construction using global counts and block offsets
Part IV

Parallel Radix Sort

26
Radix sort

I Applicability: The keys are to be sorted in lexicographic


order and the key length (in bits) is small
I Idea: Sort the keys digit-by-digit
I Complexity: O(kn), where
I n is the input length
I k is the number of digits per key
Radix sort

Input Step 1 Step 2 Step 3 Output


101 101 010 000 000
011 011 110 100 001
010 010 000 101 010
110 110 100 001 011
000 000 101 010 100
001 001 011 110 101
100 100 001 011 110
111 111 111 111 111

28
Parallel radix sort
Overview:
1. Count occurences of each digit/process pair
I cnt[ q ][ d ]: Occurences of digit d on process q
I Local computation
2. Find global block offset for each digit/process pair
I off[ q ][ d ]: Block offset for digit d on proc q
I Prefix sum of cnt in column-major order
3. Find destination index for each input element
I indx[ q ][ k ]: Destination for element k on q
I (How?)
4. Apply the input-to-output permutation
I output[ indx[ k ] ] = input[ k ]
I Permutation / all-to-all
Parallel radix sort: Example
Input
101, 010, 000, 110 || 011, 001, 100, 111
" #
3 1
cnt =
1 3
" #
0 4
off =
3 5
" #
4 0 1 2
indx =
5 6 3 7
Output

010, 000, 110, 100 || 101, 011, 001, 111

30
The End!

31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy