0% found this document useful (0 votes)
53 views20 pages

Sorting: What Makes It Hard? Chapter 7 in DS&AA Chapter 8 in DS&PS

The document discusses various sorting algorithms and their analysis. It introduces insertion sort, bubble sort, selection sort, merge sort, and quicksort. Merge sort and quicksort have average time complexities of O(n log n), which is optimal for comparison-based sorting. Quicksort is often the fastest in practice but can degrade to O(n^2) in the worst case. The document also discusses sorting for large external datasets using techniques like multi-way merging and polyphase merging.

Uploaded by

kbr_raj
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views20 pages

Sorting: What Makes It Hard? Chapter 7 in DS&AA Chapter 8 in DS&PS

The document discusses various sorting algorithms and their analysis. It introduces insertion sort, bubble sort, selection sort, merge sort, and quicksort. Merge sort and quicksort have average time complexities of O(n log n), which is optimal for comparison-based sorting. Quicksort is often the fastest in practice but can degrade to O(n^2) in the worst case. The document also discusses sorting for large external datasets using techniques like multi-way merging and polyphase merging.

Uploaded by

kbr_raj
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Sorting

What makes it hard?

Chapter 7 in DS&AA
Chapter 8 in DS&PS
Insertion Sort

• Algorithm
– Conceptually, incremental add element to sorted array
or list, starting with an empty array (list).
– Incremental or batch algorithm.
• Analysis
– In best case, input is sorted: time is O(N)
– In worst case, input is reverse sorted: time is O(N2).
– Average case is (loose argument) is O(N2)
• Inversion: elements out of order
– critical variable for determining algorithm time-cost
– each swap removes exactly 1 inversion
Inversions
• What is average number of inversions, over all inputs?
• Let A be any array of integers
• Let revA be the reverse of A
• Note: if (i,j) are in order in A they are out of order in revA.
And vice versa.
• Total number of pairs (i,j) is N*(N-1)/2 so average
number of inversions is N*(N-1)/4 which is O(N2)
• Corollary: any algorithm that only removes a single
inversion at a time will take time at least O(N2)!
• To do better, we need to remove more than one inversion
at a time.
BubbleSort

• Most frequently used sorting algorithm


• Algorithm:
for j=n-1 to 1 …. O(n)
for i=0 to j ….. O(j)
if A[i] and A[i+1] are out of order, swap them
(that’s the bubble) …. O(1)
• Analysis
– Bubblesort is O(n^2)
• Appropriate for small arrays
• Appropriate for nearly sorted arrays
• Comparision versus swaps ?
Shell Sort: 1959 by Shell

• Motivated by inversion result - need to move far


elements
• Still quadratic
• Only in text books
• Historical interest and theoretical interest - not fully
understood.
• Algorithm: (with schedule 1, 3, 5)
– bubble sort things spaced 5 apart
– bubble sort things 3 apart
– bubble sort things 1 apart
• Faster than insertion sort, but still O(N^2)
• No one knows the best schedule
Divide and Conquer: Merge Sort

• Let A be array of integers of length n


• define Sort (A) recursively via auxSort(A,0,N) where
• Define array[] Sort(A,low, high)
– if (low == high) return
– Else
• mid = (low+high)/2
• temp1 = sort(A,low,mid)
• temp2 = sort(A,mid,high)
• temp3 = merge(temp1,temp2)
Merge

• Int[] Merge(int[] temp1, int[] temp2)


– int[] temp = new int[ temp1.length+temp2.length]
– int i,j,k
– repeat
• if (temp1[i]<temp2[j]) temp[k++]=temp1[i++]
• else temp[k++] = temp2[j++]
– for all appropriate i, j.
• Analysis of Merge:
– time: O( temp1.length+temp2.length)
– memory: O(temp1.length+temp2.length)
Analysis of Merge Sort

• Time
– Let N be number of elements
– Number of levels is O(logN)
– At each level, O(N) work
– Total is O(N*logN)
– This is best possible for sorting.
• Space
– At each level, O(N) temporary space
– Space can be freed, but calls to new costly
– Needs O(N) space
– Bad - better to have an in place sort
– Quick Sort (chapter 8) is the sort of choice.
Quicksort: Algorithm

• QuickSort - fastest algorithm


• QuickSort(S)
– 1. If size of S is 0 or 1, return S
– 2. Pick element v in S (pivot)
– 3. Construct L = all elements less than v and
R = all elements greater than v.
– 4. Return QuickSort(L), then v, then QuickSort(R)
• Algorithm can be done in situ (in place).
• On average runs in O(NlogN), but can take O(N2) time
– depends on choice of pivot.
Quicksort: Analysis

• Worst Case:
– T(N) = worst case sorting time
– T(1) = 1
– if bad pivot, T(N) = T(N-1)+N
– Via Telescope argument (expand and add)
– T(N) = O(N^2)
• Average Case (text argument)
– Assume equally likely subproblem sizes
• Note: chance of picking ith is 1/N
– T(N) average cost to sort
Analysis continued

– T(left branch) = T(right branch) (average) so


– T(N) = 2* ( T(0)+T(1)….T(N-1) )/N + N, where N is
cost of partitioning
– Multiply by N:
• NT(N) = 2(T(0)+…+T(N-1)) +N^2 (*)
– Subtract N-1 case of (*)
• NT(N) - (N-1)T(N-1) = 2T(N-1) +2N-1
– Rearrange and drop -1
• NT(N) = (N+1)T(N-1) + 2N -1
– Divide by N(N+1)
• T(N)/(N+1) = T(N-1) + 2/(N+1)
Last Step

• Substitute N-1, N-2,... 3 for N


– T(N-1)/N = T(N-2)/(N-1) + 2/N
– …
– T(2)/3 = T(1)/2 +2/3
• Add
– T(N)/(N+1) = T(1)/2+ 2(1/3+1/4 + ..+1/(N+1)
– = 2( 1+1/2 +…) -5/2 since T(1) = 0
– = O(logN)
• Hence T(N) = N logN
• In literature, more accurate proof.
• For better results, choose pivot as median of 3 random
values.
Quickselect: Algorithm

• Problem: find the kth smallest item


• Algorithm: modify Quicksort
– let |S| be the number of elements in S.
• QuickSelect(S, k)
– if |S| = 1, return element in S
– Pick element p in S (the pivot)
– Partition S via p as in QuickSort into L and R
– if k < |L| return QuickSelect(L,k)
– if k = |L|+1, return pivot
– otherwise return QuickSelect(R, k - |L|-1)
Quickselect: Analysis

• Worst Case is O(N^2)


• Average Case: analysis similar to quicksort’s.
• Here T(N) = 1*(T(0)+T(1)+…+T(N-1))/N + N
• Multiply by N
– NT(N) = T(0)+T(1) +T(N-1) + N^2
• Substitute with N = N-1 and subtract:
– NT(N) -(N-1)T(N-1) = T(N-1) + 2N -1
• Rearrange and divide by N
– T(N)= T(N-1)+2
– T(N) = T(N-2) + 4….. = T(1)+2*N = O(N)
• Average Case: Linear.
Bucket Sort

• A linear time sort algorithm!


• Need to know the possible values.
• Example 1: to sort N integers less than M.
– Make array A of size M
– Read each integer i and update, A[i]++
• Example 2: 200 names
– make array of size 26*26 = 676
– Using first 2 letters of each name, put it in [char-char]
bucket (usually a short ordered linked list)
– Collect them up
Radix Sorting (card sorting)

• Uses linked lists


• Idea: Multiple passes of Bucket Sort
• Trick: Iteratively sort by last index, next to last, etc.
• Example
ed ca xa cd xd bd
pass1: a:{ca, xa} d:{ed, cd, xd, bd}
ca xa ed cd xd bd
pass 2: b{bd} c: {ca, cd} e: {ed} x:{xa, xd}
bd ca cd ed xa xd
• Complexity: O(N* number of passes)
– number of passes = length of key
External Sorting (Tape or CD)

• Idea: merge sort (2-way)


• Suppose memory size is M (enough to sort internally)
• Ta1, Ta2, Tb1, Tb2 are tape drives
• Data on Ta1 (initially)
• Pass 1:
– read M records
– sort and write to Tb1, Tb2 alternatively
(each run of M records on Tb1, Tb2 is sorted)
• Pass 2:
– merge sort Tb1 and Tb2 onto Ta1 and Ta2
• Note this takes O(1) memory
– Each run of 2*M records is sorted
External Sorting

• Continuing merging, alternating writing to ta1, ta2.


• Number of passes is log(N/M)
• Time comlexity is O( N/M *log(M)) for first pass
• O(N) for subsequent passes
• Total: O(max(N log(N/M), N/M*log(M))
• With more tapes, can reduce time by doing k-way merge
rather than 2-way merge
• Replace Log base 2 with log base k
• A trickier algorithm (Polyphase) can do it with fewer
tapes.
• Who uses tapes? Algorithm works for CDs
Lower Bound for Sorting

• Theorem: if you sort by comparisons, then must use at


least log(N!) comparisons. Hence N logN algorithm.
• Proof:
– N items can be rearranged in N! ways.
– Consider a decision tree where each internal node is a
comparison.
– Each possible array goes down one path
– Number of leaves N!
– minimum depth of a decision tree is log(N!)
– log(N!) = log1+log2+…+log(N) is O(N logN)
– Proof: use partition trick
• sum log(N/2) + log(N/2+1)….log(N) >N/2*log(N/2)
Summary
• For online sorting, use heapsort.
– Online : get elements one at at time
– Offline or Batch: have all elements available
• For small collections, bubble sort is fine
• For large collections, use quicksort
• You may hybridize the algorithms, e.g
– use quicksort until the size is below some k
– then use bubble sort
• Sorting is important and well-studied and often
inefficiently done.
• Libraries often contain sorting routines, but beware:
the quicksort routine in Visual C++ seems to run in
quadratic time. Java sorts in Collections are fine.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy