Algorithms_Unit1_074220
Algorithms_Unit1_074220
of Algorithms
Unit 1
Searching, Sorting, Selection
1. Insertion Sort
2. Analyzing Algorithms
3. Heap Sort
4. Non-Comparison Sorting Algorithms
5. Minimum and Maximum
Lesson One
▪ This algorithm works for an array A that starts out with the sequence [5, 2, 4,
6, 1, 3].
▪ The index i indicates the "current card" being inserted into the hand.
▪ At the beginning of each iteration of the for loop, which is indexed by i, the
subarray consisting of elements A[1 : i–1] constitutes the currently
sorted hand, and the remaining subarray A[i+1 : n] corresponds to the pile of
cards still on the table.
▪ In fact, elements A[1 : i-1] are the elements originally in positions 1 through i-1,
but now in sorted order.
Loop Invariants of Insertion Sort
▪ Key Idea:
– A loop invariant is true before every iteration of the loop. Use
established facts (beyond the invariant) to prove this.
▪ Connection to Induction:
– Base Case: Show the invariant holds before the first iteration.
– Inductive Step: Prove it holds for each iteration.
▪ Why It Matters:
– The third property ensures correctness by combining the
invariant with the termination condition.
– Unlike infinite induction, the process stops when the loop ends.
Checking Loop Invariants of
Insertion Sort for Correctness of
Algorithm
▪ Termination Step:
– The loop starts with i=2 and increments by 1 in each iteration.
– The loop terminates when i > n, i.e., i = n+1.
▪ Precise Formula:
– Initially, we could compute the running time of INSERTION-SORT by
counting the number of executions of each statement, factoring in
the constant costs ck for each line. This gives a precise formula but
is often complex and cumbersome to work with.
▪ Simplified Notation:
– To make comparisons easier, especially as the input size grows, we
switch to a simpler, more concise notation.
– This big-O notation allows us to express the algorithm’s running
time in a way that highlights how it scales with input size, making
it much easier to compare different algorithms.
Analysis of Insertion Sort
▪ To analyze the INSERTION-SORT procedure, let’s view it with the time cost of
each statement and the number of times each statement is executed.
▪ For each i = 2, 3, ... , n, let ti denote the number of times the while loop test in
line 5 is executed for that value of i.
▪ When a for or while loop exits in the usual way - because the test in the loop
header comes up FALSE - the test is executed one time more than the loop body.
Analysis of Insertion Sort
▪ The total running time T(n) is the sum of the running times for each statement
executed.
▪ If a statement takes ck steps to execute and runs m times, it contributes
ckm to the total running time.
▪ We usually denote the running time of an algorithm on an input of size n by
T(n). To compute T(n), the running time of INSERTION-SORT on an input of n
values, we sum the products of the cost and times columns, obtaining:
Analysis of Insertion Sort
and
▪ The term "Heap" was originally coined in the context of heapsort, but
it has since come to refer to "garbage-collected storage", such as the
programming languages Java and Python provide.
▪ Please don’t be confused. The heap data structure is not
garbage-collected storage.
▪ Here we will be using the term "Heap" to refer to the data structure,
not the storage class.
Heap
▪ Heap:
– The (binary) heap data structure is an array object
that we can view as a nearly complete binary tree.
▪ Each node of the tree corresponds to an element of the array.
▪ The tree is filled on all levels except possibly the lowest, which
is ûlled from the left up to a point.
– A heap is a specialized binary tree-based data
structure that satisfies the heap property:
▪ In a max-heap, the value of each node is greater than or equal
to the values of its children and in a min-heap, the value of
each node is less than or equal to the values of its children.
Heap
▪ let T(n) be the worst-case running time that the procedure takes on a
subtree of size at most n.
▪ For a tree rooted at a given node i, the running time is the O(1) time
to fix up the relationships among the elements A[i], A[LEFT(i)], and
A[RIGHT(i)], plus the time to run MAX-HEAPIFY on a subtree rooted at
one of the children of node i (if the recursive call occurs).
▪ The children’s subtrees each have size at most 2n/3, and therefore we
can describe the running time of MAX-HEAPIFY by the recurrence T(n)
≤ T(2n/3) + O(1).
▪ The solution to this recurrence, by case 2 of the master theorem is
T(n) = O(lg n).
▪ Alternatively, we can characterize the running time of MAX-HEAPIFY
on a node of height h as O(h).
Building a Heap
▪ We need to show that this invariant is true prior to the first loop
iteration, that each iteration of the loop maintains the
invariant, that the loop terminates, and that the invariant
provides a useful property to show correctness when the loop
terminates.
▪ Initialization:
– Prior to the first iteration of the loop, i = ceil(n/2). Each node ceil(n/2) + 1,
ceil(n/2) + 2, ... , n is a leaf and is thus the root of a trivial max-heap.
Correctness of BUILD-MAX-HEAP
▪ Maintenance:
– To see that each iteration maintains the loop invariant, observe that the
children of node i are numbered higher than i.
– By the loop invariant, therefore, they are both roots of max-heaps.
– This is precisely the condition required for the call MAX-HEAPIFY(A, i) to
make node i a max-heap root.
– Moreover, the MAX-HEAPIFY call preserves the property that nodes i+1,
i+2, ... , n are all roots of max-heaps. Decrementing i in the for loop
update reestablishes the loop invariant for the next iteration.
▪ Termination:
– The loop makes exactly floor(n/2) iterations, and so it terminates. At
termination, i = 0. By the loop invariant, each node 1, 2, ... , n is the
root of a max-heap. In particular, node 1 is.
Analysis of BUILD-MAX-HEAP
▪ The HEAPSORT
procedure then repeats
this process for the max-
heap of size n-1 down to
a heap of size 2.
The operation of HEAPSORT
▪ (a) The max-heap data structure just after BUILD-MAX-HEAP has built it
in line 1.
▪ (b)–(c) The max-heap just after each call of MAX-HEAPIFY in line 5,
showing the value of i at that time. Only blue nodes remain in the
heap. Tan nodes contain the largest values in the array, in sorted order.
The operation of HEAPSORT
▪ Theorem:
– Any comparison sort algorithm requires Ω(n lg n) comparisons in the
worst case.
▪ Proof:
– From the preceding slide, it suffices to determine the height of a
decision tree in which each permutation appears as a reachable leaf.
– Consider a decision tree of height h with l reachable leaves
corresponding to a comparison sort on n elements.
– Because each of the n! permutations of the input appears as one or
more leaves, we have n! ≤ l.
– Since a binary tree of height h has no more than 2h leaves, we have
n! ≤ l ≤ 2h
– By taking algorithms h ≥ lg(n!) which in turn implies h = Ω(n lg n)
Counting Sort Algorithm
(a) The array A and the auxiliary array C after line 5. (b) The
array C after line 8. (c)–(e) The output array B and the auxiliary
array C after one, two, and three iterations of the loop in lines 11-
13, respectively. Only the tan elements of array B have been filled
in. (f) The final sorted output array B.
Analysis of Counting Sort
▪ The for loop of lines 2-3 takes O(k) time. The for loop of lines 4-5
takes O(n) time. The for loop of lines 7-8 takes O(k) time. The for
loop of lines 11-13 takes O(n) time.
▪ Thus, the overall time is O(k + n). In practice, we usually use
counting sort when we have k = O(n), in which case the running
time is O(n).
▪ Counting sort can beat the lower bound of Ω(n lg n) because it is
not a comparison sort. In fact, no comparisons between input
elements occur anywhere in the code.
▪ Instead, counting sort uses the actual values of the elements to
index into an array. The Ω(n lg n) lower bound for sorting does not
apply when we depart from the comparison sort model.
Radix Sort
▪ Proof:
– The analysis of the running time depends on the stable sort used as the
intermediate sorting algorithm. When each digit lies in the range 0 to k-1, and k is
not too large, counting sort is the obvious choice. Each pass over n d-digit
numbers then takes O(n + k) time. There are d passes, and so the total time for
radix sort is O(d(n + k)).
– When d is constant and k = O(n), we can make radix sort run in linear time.
Lesson Five