0% found this document useful (0 votes)
14 views

Algorithms_Unit1_074220

The document covers the design and analysis of algorithms, focusing on searching, sorting, and selection techniques, particularly Insertion Sort and Heap Sort. It explains the Insertion Sort algorithm, its loop invariants, and correctness, along with performance analysis in terms of best, worst, and average cases. Additionally, it introduces heaps, their properties, and the Heapsort algorithm, emphasizing the importance of understanding algorithm efficiency and resource usage.

Uploaded by

ks6069116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Algorithms_Unit1_074220

The document covers the design and analysis of algorithms, focusing on searching, sorting, and selection techniques, particularly Insertion Sort and Heap Sort. It explains the Insertion Sort algorithm, its loop invariants, and correctness, along with performance analysis in terms of best, worst, and average cases. Additionally, it introduces heaps, their properties, and the Heapsort algorithm, emphasizing the importance of understanding algorithm efficiency and resource usage.

Uploaded by

ks6069116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Design and Analysis

of Algorithms
Unit 1
Searching, Sorting, Selection

1. Insertion Sort
2. Analyzing Algorithms
3. Heap Sort
4. Non-Comparison Sorting Algorithms
5. Minimum and Maximum
Lesson One

▪ We will cover these points


– Insertion Sort Algorithm
– Loop invariants and Correctness of Insertion Sort
Algorithm
Insertion Sort Algorithm

Input: A sequence of n numbers [a1, a2, ... , an].


Output: A permutation (reordering) [a1', a2', ... , an'] of
the input sequence such that a1' ≤ a2' ≤ ... ≤ an' .
• Insertion sort: A card sorting analogy
▪ Start with your left hand empty and a pile of cards on the
table.
▪ Pick up the first card and hold it in your left hand.
▪ One by one, take cards from the pile with your right hand and
insert them into the correct position in your left hand.
▪ Repeat until all cards are sorted in your left hand.
Finding The Correct Position

• Compare the card in your right hand


with the cards in your left hand,
moving from right to left.
• Insert the card to the right of the first
card in your left hand that is less than
or equal to it.
▪ If no such card exists, place it as the
leftmost card in your left hand.
Key Insight:
▪ The cards in your left hand are always
sorted.
▪ These cards come sequentially from
Insertion Sort Algorithm

▪ The pseudocode for insertion sort is given as the procedure INSERTION-SORT


below.
▪ It takes two parameters: an array A containing the values to be sorted and the
number n of values of sort.
▪ The values occupy positions A[1] through A[n] of the array, which we denote by
A[1:n].
▪ When the INSERTION-SORT procedure is finished, array A[1:n] contains the original
values, but in sorted order.
Working of the Insertion Sort
▪ The operation of INSERTION-SORT(A, n), where A initially contains the sequence
[5, 2, 4, 6, 1, 3] and n = 6.
▪ Array indices appear above the rectangles, and values stored in the array
positions appear within the rectangles.
▪ (a)–(e) The iterations of the for loop of lines 1-8.
▪ In each iteration, the blue rectangle holds the key taken from A[i], which is
compared with the values in tan rectangles to its left in the test of line 5. Orange
arrows show array values moved one position to the right in line 6, and blue
arrows indicate where the key moves to in line 8.
▪ (f) The final sorted array.
Loop Invariants of Insertion Sort

▪ This algorithm works for an array A that starts out with the sequence [5, 2, 4,
6, 1, 3].
▪ The index i indicates the "current card" being inserted into the hand.
▪ At the beginning of each iteration of the for loop, which is indexed by i, the
subarray consisting of elements A[1 : i–1] constitutes the currently
sorted hand, and the remaining subarray A[i+1 : n] corresponds to the pile of
cards still on the table.
▪ In fact, elements A[1 : i-1] are the elements originally in positions 1 through i-1,
but now in sorted order.
Loop Invariants of Insertion Sort

▪ We state these properties of A[1 : i–1] formally as a loop


invariant:
– At the start of each iteration of the for loop of lines 1-8, the
subarray A[1 : i-1] consists of the elements originally in A[1 : i-1],
but in sorted order.
▪ Understanding Loop Invariants:
▪ Initialization: Show the invariant holds true before the first loop
iteration.
▪ Maintenance: Prove that if the invariant is true before an iteration,
it remains true afterward.
▪ Termination: At loop termination, combine the invariant with the
termination condition to confirm the algorithm’s correctness.
Understanding Loop Invariants

▪ Key Idea:
– A loop invariant is true before every iteration of the loop. Use
established facts (beyond the invariant) to prove this.

▪ Connection to Induction:
– Base Case: Show the invariant holds before the first iteration.
– Inductive Step: Prove it holds for each iteration.

▪ Why It Matters:
– The third property ensures correctness by combining the
invariant with the termination condition.
– Unlike infinite induction, the process stops when the loop ends.
Checking Loop Invariants of
Insertion Sort for Correctness of
Algorithm

▪ Initialization Step: Before the first iteration (when i=2):


– The subarray A[1 : i-1] contains only A[1].
– A[1] remains unchanged and neing a single element is enherently
sorted.
– Thus, the loop invariant holds before the first iteration.
▪ Maintenance Step:During each iteration:
– The loop shifts elements A[i−1],A[i−2],… one position right to find the
correct spot for A[i].
– Once inserted, A[1 : i] becomes sorted, containing the same elements
as before but in order.
– Incrementing i for the next iteration ensures the loop invariant holds.
 Note: A formal proof would involve establishing a loop invariant for the inner while
loop (lines 5–7), but our informal reasoning suffices here to show the invariant is
maintained for the outer loop.
Checking Loop Invariants of
Insertion Sort for Correctness of
Algorithm

▪ Termination Step:
– The loop starts with i=2 and increments by 1 in each iteration.
– The loop terminates when i > n, i.e., i = n+1.

▪ At termination, substituting i = n+1 into the loop invariant


shows:
– The subarray A[1 : n] contains the same elements as the original A[1 : n]
but now sorted.

▪ This confirms the algorithm’s correctness.


Lesson Two

▪ We will cover these points


– Analyzing Algorithms
– Analysis of Insertion Sort
Analyzing Algorithms

▪ Focus on predicting the resources an algorithm uses, such as memory,


bandwidth, energy, or—most commonly—computational time.
▪ By comparing candidate algorithms, you can identify the most efficient
one.
▪ While multiple viable options may exist, analysis often helps eliminate
less efficient choices.
▪ To analyze an algorithm, you need a model of the computing
technology, its resources, and their costs.
▪ So, we will assume a RAM (Random-Access Machine) model, where:
– Instructions execute sequentially, with no concurrent operations.
– All instructions and data accesses (e.g., variable usage or array indexing) take a
constant amount of time.
– This simplification allows for consistent and practical analysis of algorithm efficiency.
Analysis of Insertion Sort

▪ Evaluating the Runtime of INSERTION-SORT:


– Timing Tests: Running the algorithm on a computer gives specific
results influenced by:
 The computer hardware and background tasks.
 The programming language and libraries used.
 The compiler or interpreter's efficiency.
 The specific input data.
– Limitations: Timing tests only reveal how the algorithm performs
in that exact scenario. They don't generalize to:
 Different inputs.
 Other systems or implementations.
– Better Approach: To predict runtime across varying conditions,
we analyze the algorithm's performance mathematically,
independent of specific implementations or environments.
Analysis of Insertion Sort

▪ Running Time in the RAM Model:


– Definition: The running time of an algorithm is the total
number of instructions and data accesses executed on a
given input.
– Constant Time Per Line: We will assume that each line of
pseudocode takes a constant time to execute. Specifically,
each line k takes a constant ck amount of time.
– Consistency: This approach abstracts away specific
hardware or implementation details, focusing on the
algorithm's performance in a general context. It reflects how
the pseudocode would likely be executed on most real
computers.
Analysis of Insertion Sort

▪ Precise Formula:
– Initially, we could compute the running time of INSERTION-SORT by
counting the number of executions of each statement, factoring in
the constant costs ck for each line. This gives a precise formula but
is often complex and cumbersome to work with.
▪ Simplified Notation:
– To make comparisons easier, especially as the input size grows, we
switch to a simpler, more concise notation.
– This big-O notation allows us to express the algorithm’s running
time in a way that highlights how it scales with input size, making
it much easier to compare different algorithms.
Analysis of Insertion Sort

▪ To analyze the INSERTION-SORT procedure, let’s view it with the time cost of
each statement and the number of times each statement is executed.

▪ For each i = 2, 3, ... , n, let ti denote the number of times the while loop test in
line 5 is executed for that value of i.
▪ When a for or while loop exits in the usual way - because the test in the loop
header comes up FALSE - the test is executed one time more than the loop body.
Analysis of Insertion Sort

▪ The total running time T(n) is the sum of the running times for each statement
executed.
▪ If a statement takes ck steps to execute and runs m times, it contributes
ckm to the total running time.
▪ We usually denote the running time of an algorithm on an input of size n by
T(n). To compute T(n), the running time of INSERTION-SORT on an input of n
values, we sum the products of the cost and times columns, obtaining:
Analysis of Insertion Sort

▪ Even for inputs of a given size, an algorithm’s running time may


depend on which input of that size is given.
▪ Best Case: When the array is already sorted.
– In this case, each time that line 5 executes, the value originally in A[i] is
already greater than or equal to all values in A[1 : i-1], so that the while loop
of lines 5-7 always exits upon the first test in line 5.
– Therefore, we have that ti = 1 for i = 2, 3, ... , n, and the best-case running
time is given by

▪ As it can be expressed as an + b, This running time is thus a


linear function of n. So, T(n) = Ω(n).
Analysis of Insertion Sort

▪ Worst Case: When the array is reverse sorted.


– The procedure must compare each element A[i] with each element in the
entire sorted subarray A[1 : i-1], and so ti = i for i = 2, 3, ... , n. (The
procedure finds that A[j] > key every time in line 5, and the while loop
exits only when j reaches 0.) Noting that

and

the calculation of worst case becomes a bit more complex


which is
Analysis of Insertion Sort

▪ Which can be expressed as an2 + bn+ c, which is quadratic


function of n. So, T(n) = O(n2).
Analysis of Insertion Sort

▪ Average Case: The "average case" is often roughly as bad as


the worst case.
– When running insertion sort on a randomly chosen array of nnn numbers,
the time to determine where to insert A[i] in the sorted subarray A[1 : i−1]
depends on the number of comparisons.
– On Average: Half of the elements in A[1 : i−1] are smaller than A[i], and
half are larger. Therefore, A[i] will be compared with about half of the
subarray, so the number of comparisons ti is approximately i/2.

▪ Average-Case Running Time:


– The total number of comparisons across all iterations results in a quadratic
running time. Specifically, the average-case running time is also
quadratic, just like the worst-case running time.
– Thus, the average case does not provide significant improvement over the
worst case for insertion sort, as both have quadratic time complexity O(n2).
Lesson Three

▪ We will cover these points


– Heaps (Minheap and Maxheap)
– Maintaining the Heap Property
– Building a Heap
– Heapsort Algorithm
Introduction

▪ Algorithm Design Technique:


– Unlike algorithms like Bubble Sort, Selection Sort, and Insertion Sort (which
use an iterative approach), and Merge Sort and Quick Sort (which use the
divide-and-conquer approach), Heapsort uses a different technique: it leverages
a heap data structure to manage and sort the data.

▪ The term "Heap" was originally coined in the context of heapsort, but
it has since come to refer to "garbage-collected storage", such as the
programming languages Java and Python provide.
▪ Please don’t be confused. The heap data structure is not
garbage-collected storage.
▪ Here we will be using the term "Heap" to refer to the data structure,
not the storage class.
Heap

▪ Heap:
– The (binary) heap data structure is an array object
that we can view as a nearly complete binary tree.
▪ Each node of the tree corresponds to an element of the array.
▪ The tree is filled on all levels except possibly the lowest, which
is ûlled from the left up to a point.
– A heap is a specialized binary tree-based data
structure that satisfies the heap property:
▪ In a max-heap, the value of each node is greater than or equal
to the values of its children and in a min-heap, the value of
each node is less than or equal to the values of its children.
Heap

▪ An array A[1 : n] that represents a heap is an object with


an attribute A.heap-size, which represents how many
elements in the heap are stored within array A.
▪ That is, although A[1 : n] may contain numbers, only the
elements in A[1 : A:heap-size], where 0 ≤ A:heap-
size ≤ n, are valid elements of the heap.
▪ If A.heap-size = 0, then the heap is empty.
▪ The root of the tree is A[1], and given the index i of a
node, there’s a simple way to compute the indices of its
parent, left child, and right child with the one-line
procedures PARENT, LEFT, and RIGHT.
A max-heap Example

▪ A max-heap viewed as (a) a binary tree and (b) an array.


▪ The number within the circle at each node in the tree is the value stored at that node.
▪ The number above a node is the corresponding index in the array.
▪ Above and below the array are lines showing parent-child relationships, with parents
always to the left of their children.
▪ The tree has height 3, and the node at index 4 (with value 8) has height 1.
Finding Left Child, Right Child and
Parent of a Node in Heap

▪ On most computers, the LEFT procedure can


compute 2i in one instruction by simply shifting
the binary representation of i left by one bit
position.
▪ Similarly, the RIGHT procedure can quickly
compute 2i +1 by shifting the binary
representation of i left by one bit position and
then adding 1.
▪ The PARENT procedure can compute i/2 by
shifting i right one-bit position.
▪ Good implementations of heapsort often
implement these procedures as macros or
inline procedures.
Max-Heaps and Max-Heap Property

▪ There are two kinds of binary heaps:


– max-heaps and min-heaps.

▪ In both kinds, the values in the nodes satisfy a heap


property, the specifics of which depend on the kind of
heap.
▪ In a max-heap, the max-heap property is that for every
node i other than the root, A[PARENT(i)] ≥ A[i], that is,
the value of a node is at most the value of its parent.
▪ Thus, the largest element in a max-heap isstored at the
root, and the subtree rooted at a node contains values
no larger than that contained at the node itself.
Heaps and Opearations on Heaps

▪ Viewing a heap as a tree, we define the height of a node


in a heap to be the number of edges on the longest
simple downward path from the node to a leaf, and we
define the height of the heap to be the height of its root.
▪ Since a heap of n elements is based on a complete
binary tree, its height is O(lg n).
▪ So, the basic operations on heaps run in time at most
proportional to the height of the tree and thus take O(lg
n) time.
Heaps and Opearations on Heaps

▪ The MAX-HEAPIFY procedure, which runs in O(lg n) time,


is the key to maintaining the max-heap property
▪ The BUILD-MAX-HEAP procedure, which runs in linear
time, produces a maxheap from an unordered input array.
▪ The HEAPSORT procedure, which runs in O(n lg n) time,
sorts an array in place.
Maintaining the Heap Property

▪ The procedure MAX-HEAPIFY on the side


maintains the max-heap property.
▪ Its inputs are an array A with the heap-
size attribute and an index i into the
array.
▪ When it is called, MAX-HEAPIFY assumes
that the binary trees rooted at LEFT(i)
and RIGHT(i) are max-heaps, but that
A[i] might be smaller than its children,
thus violating the max-heap property.
▪ MAX-HEAPIFY lets the value at A[i] "float
down" in the max-heap so that the
subtree rooted at index i obeys the
maxheap property.
Working of MAX-HEAPIFY

• The action of MAX-


HEAPIFY(A, 2), where A.heap-
size = 10.
• Node that violates the max-
heap property is shown in
blue.
• The initial configuration, with
A[2] at node i=2 violating
the max-heap property since
it is not larger than both
children.
Working of MAX-HEAPIFY

• The max-heap property is


restored for node 2 by
exchanging A[2] with A[4],
which destroys the max-heap
property for node 4.
• The recursive call MAX-
HEAPIFY(A, 4) now has i = 4.
Working of MAX-HEAPIFY

• After A[4] and A[9] are


swapped, node 4 is fixed
up, and the recursive call
MAX-HEAPIFY(A, 9) yields no
further change to the data
structure.
Analyzing MAX-HEAPIFY

▪ let T(n) be the worst-case running time that the procedure takes on a
subtree of size at most n.
▪ For a tree rooted at a given node i, the running time is the O(1) time
to fix up the relationships among the elements A[i], A[LEFT(i)], and
A[RIGHT(i)], plus the time to run MAX-HEAPIFY on a subtree rooted at
one of the children of node i (if the recursive call occurs).
▪ The children’s subtrees each have size at most 2n/3, and therefore we
can describe the running time of MAX-HEAPIFY by the recurrence T(n)
≤ T(2n/3) + O(1).
▪ The solution to this recurrence, by case 2 of the master theorem is
T(n) = O(lg n).
▪ Alternatively, we can characterize the running time of MAX-HEAPIFY
on a node of height h as O(h).
Building a Heap

▪ The procedure BUILD-MAX-HEAP converts an array A[1 : n] into a


max-heap by calling MAX-HEAPIFY in a bottom-up manner.
▪ The elements in the subarray A[floor(n/2) + 1 : n] are all leaves of
the tree, and so each is a 1-element heap to begin with.
▪ BUILD-MAX-HEAP goes through the remaining nodes of the tree
and runs MAX-HEAPIFY on each one.
Action of BUILD-MAX-HEAP

▪ The operation of BUILD-MAX-


HEAP, showing the data
structure before the call to
MAX-HEAPIFY in line 3 of
BUILD-MAX-HEAP. The node
indexed by i in each iteration
is shown in blue.
▪ (a) A 10-element input array
A and the binary tree it
represents. The loop index i
refers to node 5 before the
call MAX-HEAPIFY(A, i).
Action of BUILD-MAX-HEAP

▪ The operation of BUILD-MAX-


HEAP, showing the data
structure before the call to
MAX-HEAPIFY in line 3 of
BUILD-MAX-HEAP. The node
indexed by i in each iteration
is shown in blue.
▪ (b) The data structure that
results. The loop index i for
the next iteration refers to
node 4.
Action of BUILD-MAX-HEAP

▪ (c)–(e) Subsequent iterations of the for loop in BUILD-MAX-


HEAP. Observe that whenever MAX-HEAPIFY is called on a
node, the two subtrees of that node are both max-heaps.
Action of BUILD-MAX-HEAP

▪ (f) The max-heap after BUILD-MAX-HEAP finishes.


Correctness of BUILD-MAX-HEAP

▪ To show why BUILD-MAX-HEAP works correctly, we use the


following loop invariant:
– At the start of each iteration of the for loop of lines 2-3, each node i+1,
i+2, ... , n is the root of a max-heap.

▪ We need to show that this invariant is true prior to the first loop
iteration, that each iteration of the loop maintains the
invariant, that the loop terminates, and that the invariant
provides a useful property to show correctness when the loop
terminates.
▪ Initialization:
– Prior to the first iteration of the loop, i = ceil(n/2). Each node ceil(n/2) + 1,
ceil(n/2) + 2, ... , n is a leaf and is thus the root of a trivial max-heap.
Correctness of BUILD-MAX-HEAP

▪ Maintenance:
– To see that each iteration maintains the loop invariant, observe that the
children of node i are numbered higher than i.
– By the loop invariant, therefore, they are both roots of max-heaps.
– This is precisely the condition required for the call MAX-HEAPIFY(A, i) to
make node i a max-heap root.
– Moreover, the MAX-HEAPIFY call preserves the property that nodes i+1,
i+2, ... , n are all roots of max-heaps. Decrementing i in the for loop
update reestablishes the loop invariant for the next iteration.

▪ Termination:
– The loop makes exactly floor(n/2) iterations, and so it terminates. At
termination, i = 0. By the loop invariant, each node 1, 2, ... , n is the
root of a max-heap. In particular, node 1 is.
Analysis of BUILD-MAX-HEAP

▪ We can compute a simple upper bound on the running time of


BUILD-MAX-HEAP as follows.
– Each call to MAX-HEAPIFY costs O(lg n) time, and BUILD-MAX-HEAP
makes O(n) such calls.
– Thus, the running time is O(n lg n).
– This upper bound, though correct, is not as tight as it can be.
▪ We can derive a tighter asymptotic bound by observing that the
time for MAX-HEAPIFY to run at a node varies with the height of the
node in the tree, and that the heights of most nodes are small.
– Our tighter analysis relies on the properties that an n-element heap has
height floor(lg n) and at most ceil(n/2h+1) nodes of any height h.
Analysis of BUILD-MAX-HEAP

▪ The time required by MAX-HEAPIFY when called on a node of height h is


O(h).
▪ Letting c be the constant implicit in the asymptotic notation, we can
express the total cost of BUILD-MAX-HEAP as being bounded from
above by

▪ As we have ceil(n/2h+1) ≥ 1/2 for 0 ≤ h ≤ lg n. Since ceil(x) ≤ 2x for any


x ≥ 1/2, we have ceil(n/2 ) ≤ n/2 . We thus obtain
h+1 h

▪ Hence, we can build a max-heap from an unordered array in linear


time.
Heapsort Algorithm

▪ The heapsort algorithm,


given by the procedure
HEAPSORT, starts by
calling the BUILD-MAX-
HEAP procedure to build
a max-heap on the input
array A[1 : n].
▪ Since the maximum
element of the array is
stored at the root A[1],
HEAPSORT can place it
into its correct final
position by exchanging it
with A[n].
Heapsort Algorithm

▪ If the procedure then


discards node n from the
heap - and it can do so by
simply decrementing
A.heap-size - the children
of the root remain max-
heaps, but the new root
element might violate the
max-heap property.
▪ To restore the max-heap
property, the procedure
just calls MAX-HEAPIFY(A,
1), which leaves a max-
heap in A[1 : n-1].
Heapsort Algorithm

▪ The HEAPSORT
procedure then repeats
this process for the max-
heap of size n-1 down to
a heap of size 2.
The operation of HEAPSORT

▪ (a) The max-heap data structure just after BUILD-MAX-HEAP has built it
in line 1.
▪ (b)–(c) The max-heap just after each call of MAX-HEAPIFY in line 5,
showing the value of i at that time. Only blue nodes remain in the
heap. Tan nodes contain the largest values in the array, in sorted order.
The operation of HEAPSORT

▪ (d)–(f) The max-heap just after each call of MAX-HEAPIFY in


line 5, showing the value of i at that time. Only blue nodes
remain in the heap. Tan nodes contain the largest values in
the array, in sorted order.
The operation of HEAPSORT

▪ (g)–(i) The max-heap just after each call of MAX-HEAPIFY in


line 5, showing the value of i at that time. Only blue nodes
remain in the heap. Tan nodes contain the largest values in
the array, in sorted order.
The operation of HEAPSORT

▪ (j) The max-heap just after each call of MAX-HEAPIFY in line 5,


showing the value of i at that time. Only blue nodes remain in
the heap. Tan nodes contain the largest values in the array, in
sorted order.
▪ (k) The resulting sorted array A.
Analysis of HEAPSORT

▪ The HEAPSORT procedure takes O(n lg n) time, since the call


to BUILD-MAX-HEAP takes O(n) time and each of the n-1 calls
to MAX-HEAPIFY takes O(lg n) time.
Lesson Four

▪ We will cover these points


– Lower Bounds for Sorting
– The Decision Tree Model and A Lower Bound for The
Worst Case
– Counting Sort Algorithm
– Radix Sort Algorithm
Introduction

▪ Algorithm which we have seen till now take worst-case


time O(n2) or O(n lg n).
▪ These algorithms shared and interesting property that
the sorted order they determine is based only on
comparisons between the input elements, which Is why
we call them Comparison Sorts.
▪ We’ll prove that any comparison sort must make Ω(n lg
n) comparisons in the worst case to sort n elements.
▪ Thus, merge sort and heapsort are asymptotically
optimal, and no comparison sort exists that is faster by
more than a constant factor.
Lower Bounds for Sorting

▪ A comparison sort uses only comparisons between


elements to gain order information about an input
sequence. It may not inspect the value of the
elements or gain order information about them in any
other way.
▪ To prove a lower bound, we assume without loss of
generality in this section that all the input elements
are distinct. We also assume that all comparisons
have the form ai ≤ aj.
Decision Tree Model

▪ Comparison sorts can be viewed abstractly as decision trees.


A decision tree is a full binary tree that represents the
comparisons between elements that are performed by a
particular sorting algorithm operating on an input of a given
size. The decision tree for insertion sort operating on three
elements.
A Lower Bound for Worst Case

▪ The length of the longest simple path from the root of a


decision tree to any of its reachable leaves represents the
worst-case number of comparisons that the corresponding
sorting algorithm performs.
▪ Consequently, the worst-case number of comparisons for a
given comparison sort algorithm equals the height of its
decision tree.
▪ A lower bound on the heights of all decision trees in which
each permutation appears as a reachable leaf is therefore a
lower bound on the running time of any comparison sort
algorithm.
A Lower Bound for Worst Case

▪ Theorem:
– Any comparison sort algorithm requires Ω(n lg n) comparisons in the
worst case.

▪ Proof:
– From the preceding slide, it suffices to determine the height of a
decision tree in which each permutation appears as a reachable leaf.
– Consider a decision tree of height h with l reachable leaves
corresponding to a comparison sort on n elements.
– Because each of the n! permutations of the input appears as one or
more leaves, we have n! ≤ l.
– Since a binary tree of height h has no more than 2h leaves, we have
n! ≤ l ≤ 2h
– By taking algorithms h ≥ lg(n!) which in turn implies h = Ω(n lg n)
Counting Sort Algorithm

▪ Counting sort assumes that each of the n input elements is an


integer in the range 0 to k, for some integer k. It runs in O(n + k)
time, so that when k = O(n), counting sort runs in O(n) time.
▪ Counting sort first determines, for each input element x, the
number of elements less than or equal to x.
▪ It then uses this information to place element x directly into its
position in the output array.
▪ The COUNTING-SORT procedure on the next slide takes as input an
array A[1 : n], the size n of this array, and the limit k on the
nonnegative integer values in A. It returns its sorted output in the
array B[1 : n] and uses an array C [0 : k] for temporary working
storage.
Counting Sort Algorithm
The operation of COUNTING-SORT

(a) The array A and the auxiliary array C after line 5. (b) The
array C after line 8. (c)–(e) The output array B and the auxiliary
array C after one, two, and three iterations of the loop in lines 11-
13, respectively. Only the tan elements of array B have been filled
in. (f) The final sorted output array B.
Analysis of Counting Sort

▪ The for loop of lines 2-3 takes O(k) time. The for loop of lines 4-5
takes O(n) time. The for loop of lines 7-8 takes O(k) time. The for
loop of lines 11-13 takes O(n) time.
▪ Thus, the overall time is O(k + n). In practice, we usually use
counting sort when we have k = O(n), in which case the running
time is O(n).
▪ Counting sort can beat the lower bound of Ω(n lg n) because it is
not a comparison sort. In fact, no comparisons between input
elements occur anywhere in the code.
▪ Instead, counting sort uses the actual values of the elements to
index into an array. The Ω(n lg n) lower bound for sorting does not
apply when we depart from the comparison sort model.
Radix Sort

▪ Radix Sort is a linear-time sorting algorithm that sorts


numbers by processing individual digits. Radix Sort
uses Counting Sort as a subroutine because it needs a stable
sort algorithm.
▪ Radix Sort processes numbers digit by digit, starting from the
least significant digit to the most significant digit. It stably
sorts the numbers based on each digit using a stable sort.
▪ The code for radix sort is straightforward. The RADIX-SORT
procedure assumes that each element in array A[1 : n] has d
digits, where digit 1 is the lowest-order digit and digit d is the
highest-order digit.
Radix Sort

▪ Although the pseudocode for RADIX-SORT does not specify which


stable sort to use, COUNTING-SORT is commonly used. The correctness
of radix sort follows by induction on the column being sorted.
▪ Lemma:
– Given n d-digit numbers in which each digit can take on up to k possible values,
RADIX-SORT correctly sorts these numbers in O(d(n + k)) time if the stable sort it
uses takes O(n + k) time.

▪ Proof:
– The analysis of the running time depends on the stable sort used as the
intermediate sorting algorithm. When each digit lies in the range 0 to k-1, and k is
not too large, counting sort is the obvious choice. Each pass over n d-digit
numbers then takes O(n + k) time. There are d passes, and so the total time for
radix sort is O(d(n + k)).
– When d is constant and k = O(n), we can make radix sort run in linear time.
Lesson Five

▪ We will cover these points


– ith Order Statistic
– Minimum and Maximum
ith Order Statistic

▪ The ith order statistic of a set of n elements is the ith


smallest element.
▪ The minimum of a set of elements is the first order statistic
(i = 1), and the maximum is the nth order statistic (i = n).
▪ A median, informally, is the halfway point of the set. When
n is odd, the median is unique, occurring at i = (n+1)/2.
When n is even, there are two medians, the lower median
occurring at i = n/2 and the upper median occurring at i =
(n/2) + 1.
▪ Thus, regardless of the parity of n, medians occur at i =
floor((n+1)/2) and i = ceil((n+1)/2). For simplicity we
will be using the word median to refer to lower median.
Selection Problem

▪ Input: A set A of n distinct numbers and an integer i, with


1 ≤ i ≤ n.
▪ Output: The element x beloning to A that is larger than
exactly i - 1 other elements of A.
▪ We can solve the selection problem in O(n lg n) time
simply by sorting the numbers using heapsort or merge
sort and then outputting the ith element in the sorted
array. But there exist asymptotically faster algorithms also.
Minimum and Maximum

▪ We can find minimum element of an array A of n distinct numbers with


help of atmost n-1 comparisons. The MINIMUM procedure assumes that
array is A[1 : n].

▪ We can find maximum element also in the same fashion.


▪ Tournament method to find the minimum element also take n-1
comparisons. Since every element except the winner must lose at least
one comparison, so n-1 comparisons are necessary to determine the
minimum. Hence the algorithm MINIMUM is optimal with respect to the
number of comparisons.
Simultaneous Minimum and Maximum

▪ Some applications need to find both the minimum and the


maximum of a set of n elements. We can determine both the
minimum and the maximum of n elements using O(n)
comparisons
▪ We simply find the minimum and maximum independently, using
n-1 comparisons for each, for a total of 2n -2 = O(n)
comparisons.
▪ Although 2n-2 comparisons is asymptotically optimal, it is
possible to improve the leading constant. We can find both the
minimum and the maximum using at most 3*floor(n/2)
comparisons.
▪ The trick is to maintain both the minimum and maximum
elements seen thus far.
Simultaneous Minimum and Maximum

▪ By doing so, rather than processing each element of the input by


comparing it against the current minimum and maximum, at a
cost of 2 comparisons per element, process elements in pairs.
▪ Compare pairs of elements from the input first with each other
and then compare the smaller with the current minimum and the
larger to the current maximum, at a cost of 3 comparisons for
every 2 elements.
▪ Setting up initial values for the current minimum and maximum
depends on whether n is odd or even.
– If n is odd, set both the minimum and maximum to the value of the first
element, and then process the rest of the elements in pairs.
– If n is even, perform 1 comparison on the first 2 elements to determine the
initial values of the minimum and maximum, and then process the rest of
the elements in pairs as in the case for odd n.
Simultaneous Minimum and Maximum

▪ Let’s count the total number of comparisons. If n is odd, then


3(n-1)/2 comparisons occur. If n is even, 1 initial comparison
occurs, followed by another 3(n-2)/2 comparisons, for a total of
(3n/2)-2. Thus, in either case, the total number of comparisons is
at most 3*floor(n/2).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy