Sorting
Sorting
3
Applications of
Sorting
Uniqueness testing
Deleting duplicates
Prioritizing events
Frequency counting
Reconstructing the original order
Set intersection/union
Finding a target pair x, y such that x+y = z
Efficient searching
4
Selection Sort
Selection Sort: Idea
Given an array of n items
1. Find the largest item x, in the range of [0…n−1]
2. Swap x with the (n−1)th item
3. Reduce n by 1 and go to Step 1
6
Selection Sort: Illustration
37 is the largest, swap it with
29 10 14 37 13 the last element, i.e. 13.
Q: How to find the largest?
29 10 14 13 37
x Unsorted items
13 10 14 29 37 Largest item for
x current iteration
x Sorted items
13 10 14 29 37
10 13 14 29 37 Sorted!
We can also find the smallest and put it the front instead
7
Selection Sort: Implementation
void selectionSort(int a[], int n) {
for (int i = n-1; i >= 1; i--) {
int maxIdx = i; Step 1:
for (int j = 0; j < i; j++) if (a[j] Search for
>= a[maxIdx]) maximum
element
maxIdx = j;
// swap routine
swap(a[i], a[maxIdx]);
} Step 2:
} Swap
maximum
element
with the last
item i
8
Selection Sort: Analysis
void selectionSort(int a[], int n) { Number of times
executed
for (int i = n-1; i >= 1; i--) {
int maxIdx = i; n−1
for (int j = 0; j < i; j++) if n−1
(a[j] >= a[maxIdx]) (n−1)+(n−2)+…+1
maxIdx = j; = n(n−1)/2
// swap routine
swap(a[i], a[maxIdx]); n−1
}
} Total
= c1(n−1) +
• c1 and c2 are cost of statements in c2*n*(n−1)/2
outer and inner blocks = O(n2)
9
Bubble Sort
Bubble Sort: Idea
Given an array of n items
1. Compare pair of adjacent items
2. Swap if the items are out of order
3. Repeat until the end of array
The largest item will be at the last position
4. Reduce n by 1 and go to Step 1
Analogy
Large item is like “bubble” that floats to the end of the
array
1
Bubble Sort: Illustration
x Sorted Item
Pair of items
x under comparison
Bubble Sort: Implementation
void bubbleSort(int a[], int n) {
for (int i = n-1; i >= 1; i--) {
Step 1:
for (int j = 1; j <= i; j++) { Compare
if (a[j-1] > a[j]) adjacent
swap(a[j], a[j-1]); pairs of
numbers
}
}
Step 2:
} Swap if the
items are out
of order
29 10 14 37 13
Bubble Sort: Analysis
1 iteration of the inner loop (test and swap) requires
time bounded by a constant c
Two nested loops
Outer loop: exactly n iterations
Inner loop:
when i=0, (n−1) iterations
when i=1, (n−2) iterations
…
when i=(n−1), 0 iterations
Total number of iterations = 0+1+…+(n−1) = n(n−1)/2
Total time = c n(n−1)/2 = O(n2)
Bubble Sort: Early Termination
Bubble Sort is inefficient with a O(n2) time
complexity
However, it has an interesting property
Given the following array, how many times will the
inner loop swap a pair of item?
3 6 11 25 39
Idea
If we go through the inner loop with no swapping
the array is sorted
can stop early!
Bubble Sort v2.0: Implementation
void bubbleSort2(int a[], int n) {
for (int i = n-1; i >= 1; i--) { Assume the array
bool is_sorted = true; is sorted before
the inner loop
for (int j = 1; j <= i; j++) {
if (a[j-1] > a[j]) {
swap(a[j], a[j-1]); Any swapping will
invalidate the
is_sorted = false; assumption
}
} // end of inner loop
if (is_sorted) return; If the flag
} remains true
after the inner
} loop sorted!
Bubble Sort v2.0: Analysis
Worst-case
Input is in descending order
Running time remains the same: O(n2)
Best-case
Input is already in ascending order
The algorithm returns after a single outer iteration
Running time: O(n)
Insertion
Sort
Insertion Sort: Idea
Similar to how most people arrange a hand of
poker cards
Start with one card in your hand
Pick the next card and insert it into its proper sorted
order
Repeat previous step for all cards
… … …
…
Insertion Sort: Illustration
Start 40 13 20 8
x Sorted
x Unsorted
Iteration 1 13 40 20 8
Unsorted
x To be inserted
Iteration 2 13 20 40 8
Iteration 3 8 13 20 40
Insertion Sort: Implementation
void insertionSort(int a[], int n) {
for (int i = 1; i < n; i++) { next is the
int next = a[i]; int j; item to be
inserted
29 10 14 37 13
Insertion Sort: Analysis
Outer-loop executes (n−1) times
Number of times inner-loop is executed depends on
the input
Best-case: the array is already sorted and (a[j] > next) is
always false
No shifting of data is necessary
Worst-case: the array is reversely sorted and (a[j] > next)
is always true
Insertion always occur at the front
Divide into
two halves
7 2 6 3 8 4 5
2 3 4 5 6 7 8
Question
How should we sort the halves in the 2nd step?
Merge Sort: Implementation
void mergeSort(int a[], int low, int high)
{ if (low < high) {
int mid = (low+high) / 2; Merge sort on
a[low...high]
mergeSort(a, low , mid ); Divide a[ ] into two
mergeSort(a, mid+1, high); halves and recursively
sort them
merge(a, low, mid, high);
}
} Conquer: merge the
Function to merge two sorted halves
a[low…mid]
a[mid+1…high]
and into
a[low…high]
Note
mergeSort() is a recursive function
low >= high is the base case, i.e. there is 0 or 1 item
Merge Sort: Example
mergeSort(a[low…mid])
38 16 27 39 12 27
mergeSort(a[mid+1…high])
merge(a[low..mid],
38 16 27 39 12 27 a[mid+1..high])
38 16 27 39 12 27
Divide Phase
Recursive call to
38 16 mergeSort()
39 12
16 38 12 39
Conquer Phase
Merge steps
16 27 38 12 27 39
12 16 27 27 38 39
Merge Sort:
Merge
a[0..2] a[3..5] b[0..5]
2 4 5 3 7 8
2 4 5 3 7 8 2
2 4 5 3 7 8 2 3
2 4 5 3 7 8 2 3 4
24 5 3 7 8 2 3 4 5 Unmerged
x items
2 4 5 3 7 8 2 3 4 5 7 8 x
Items used for
comparison
Question
Why do we need a temporary array b[]?
Merge Sort: Analysis
In mergeSort(), the bulk of work is done in the
merge step
For merge(a, low, mid, high)
Let total items = k = (high − low + 1)
Number of comparisons ≤ k − 1
Number of moves from original array to temporary array = k
Number of moves from temporary array back to original
array = k
In total, number of operations ≤ 3k − 1 = O(k)
The important question is
How many times is merge() called?
Merge Sort: Analysis
Level 0: Level 0:
mergeSort n items n 1 call to mergeSort
Level 1: Level 1:
mergeSort n/2 items n/2 n/2
2 calls to mergeSort
Level 2: Level 2:
mergeSort n/22 items n/22 n/22 n/22 n/22
22 calls to mergeSort
… … …
n/(2k) = 1 n = 2k k = lg n
Merge Sort: Analysis
Level 0: 0 call to merge()
Level 1: 1 calls to merge() with n/2 items in each half, O(1 x
2 x n/2) = O(n) time
Level 2: 2 calls to merge() with n/22 items in each half, O(2
x 2 x n/22) = O(n) time
Level 3: 22 calls to merge() with n/23 items in each half,
O(22 x 2 x n/23) = O(n) time
…
Level (lg n): 2lg(n) − 1(= n/2) calls to merge() with n/2lg(n) (= 1)
item in each half, O(n) time
Total time complexity = O(n lg(n))
Optimal comparison-based sorting method
Merge Sort: Pros and Cons
Pros
The performance is guaranteed, i.e. unaffected by
original ordering of the input
Suitable for extremely large number of inputs
Can operate on the input portion by portion
Cons
Not easy to implement
Requires additional storage during merging operation
O(n) extra memory storage needed
Quick Sort
Quick Sort: Idea
Quick Sort is a divide-and-conquer algorithm
Divide step
Choose an item p (known as pivot) and partition the
items of a[i...j] into two parts
Items that are smaller than p
Items that are greater than or equal to p
Recursively sort the two parts
Conquer step
Do nothing!
p <p ≥p ?
i m k j
S1 S2 Unknown
Quick Sort: Partition Algorithm
Initially, regions S1 and S2 are empty
All items excluding p are in the unknown region
For each item a[k] in the unknown region
Compare a[k] with p
If a[k] >= p, put it into S2
Otherwise, put a[k] into S1
p ?
i k j
Unknown
Quick Sort: Partition Algorithm
Case 1: if a[k] >= p
S1 S2
If a[k]=y p, p <p x ≥p y ?
i m k j
S1 S2
Increment k p <p x p y ?
i m k j
Quick Sort: Partition Algorithm
Case 2: if a[k] < p
S1 S2
If a[k]=y < p p <p x p y ?
i m k j
Increment m p p y ?
<p x
i m k j
Swap x and y Increment
p k p x ?
<p y
i m k j
p <p y p ?
i m x k j
Quick Sort: Partition Implementation
int partition(int a[], int i, int j)
{ int p = a[i]; p is the pivot
int m = i; S1 and S2 empty
initially
for (int k = i+1; k <= j; k++) {
Go through each
if (a[k] < p) { m++;
element in unknown
swap(a[k], a[m]); Case 2 region
}
else {
}
Case 1: Do nothing!
}
swap(a[i], a[m]); Swap pivot with a[m]
return m;
} m is the index of pivot
Quick Sort: Partition Example
Quick Sort: Partition Analysis
There is only a single for-loop
Number of iterations = number of items, n, in the
unknown region
n = high − low
Complexity is O(n)
5 18 23 39 4419 57
p = a[i]
S2 = a[m+1...j]
S1 = a[i+1...m]
empty when m = i
1 n-1
Question
How do we extract the digit used for the current
grouping?
Radix Sort: Implementation
void collect(queue<int> digitQ[], v {
vector<int>& int i = 0, digit; )
Basic Idea
Start with digitQ[0]
Place all items into vector v
Repeat with digitQ[1], digitQ[2], ...
Radix Sort: Analysis
For each iteration
We go through each item once to place them into
group
Then go through them again to concatenate the groups
Complexity is O(n)
Number of iterations is d, the maximum number
of digits (or maximum number of characters)
Complexity is thus O(dn)
Properties of
Sorting
In-Place Sorting
A sort algorithm is said to be an in-place sort
If it requires only a constant amount (i.e. O(1)) of
extra space during the sorting process
Questions
Merge Sort is not in-place, why?
Is Quick Sort in-place?
Is Radix Sort in-place?
Stable Sorting
A sorting algorithm is stable if the relative order
of elements with the same key value is
preserved by the algorithm
Quick Sort
1285 5a 150 4746 602 5b 8356
1285 (5a(pivot=1285)
150 602 5b) (4746
8356)
5b 5a 150 602 1285 4746 8356
Sorting Algorithms:
Summary Worst
Case
Best
Case
In-place? Stable?