M4.arrays Searching Sorting
M4.arrays Searching Sorting
Array
(ADT/DS, Searching, and Sorting)
Instructor: Manikandan Narayanan
Weeks 3-4
• Books:
• Main textbook: “Data Structures and Algorithm Analysis in C++” by
Weiss (content, figures, slides, exercises/questions, etc.). – cited as
[WeissBook]
• Additional/optional book: “Practice of Programming” by Kernighan and
Pike (style of programming, programming exercises/questions, etc.) –
cited as [KPBook]
Outline for Module M4
• M4 Array
• M4.1 Array ADT vs. DS (incl. Matrix and Applications)
[RN]
Array (aka Vector) as an ADT
Array: ordered collection of items (of the same primitive or complex data type)
accessible by an integer index
class Array {
public:
Array();
//minimal reqd. functionality:
void set(int i, Element v);
Element get(int i);
Key Property: Store/retrieve elements
using an integer index (position).
//optional:
int find(Element e); What are the complexities
void print(); of these operations?
int size();
};
[RN]
Other ADTs
Queue ADT (FIFO)
enqueue, dequeue
Set ADT
union, find, intersection, complement, etc.
Fan regulator
IncSpeed, decSpeed, getSpeed, getCompanyName
Integer
size, isSigned, getValue, setValue, add, sub
Student
getRollNo, getHostel, getFavGame, setHostel, getSlots, setCGPA
[RN]
This course is…
…all about such ADTs, their
implementations (DS), and their prog.
applications!
ADT (interface) vs. DS (impl.)
Abstract Data Type (ADT) Other Common Names Commonly Implemented
with (DS)
Array Vector Array (static/dynamic)
List Sequence Array, Linked List
Queue Array, Linked List
Double-ended Queue Dequeue, Deque Array, Doubly-linked List
Stack Array, Linked List
Associative Array Dictionary, Hash Map, Hash, Hash Table
Map**
Set Red-black Tree, Hash Table
Priority Queue Heap Heap
Binary Tree… … …
Graph… ... …
[From Abstract Data Types. Brilliant.org. Retrieved 06:42, August 13, 2024, from https://brilliant.org/wiki/abstract-data-types/ ]
Implementing Array ADT using
Array DS (contiguous memory locations)
4 2 7 2 9
class Array {
public:
Array();
Design decisions
//minimal reqd. functionality
void set(int i, Element v); Size of the array? (Static vs. Dynamic)
Element get(int i); Maintain size separately or use a
sentinel?
//optional: On overflow: error or realloc?
int find(Element e); On underflow: error message or exit or
void print(); silent?
int size(); Are duplicates allowed? If so, what should
}; find return?
...
[RN]
Array ADT using Array DS
(static array of some max size)
4 2 7 2 9
class Array {
public:
Array();
With certain design decisions:
//minimal reqd. functionality
void set(int i, Element v); O(1)
Element get(int i); O(1)
//optional:
O(N)
int find(Element e);
void print(); O(N)
int size(); O(1)
};
[RN]
Implementation Details: Array DS
(contiguous memory locations)
Simplest data structure
− Acts as aggregate over primitives or other aggregates
− May have multiple dimensions
Contiguous storage
Random access in O(1)
Languages such as C use type system to index appropriately
− e.g., a[i] and a[i + 1] refer to locations (memory address) based
on type
Storage space:
− Fixed for arrays
− Dynamically allocatable but fixed on system’s stack or heap
− Variable for vectors (internally, reallocation and copying)
[RN]
Implementation Details: 2D Array
(aka Matrix) DS
Typically, 2D arrays stored as a single 1D
array in contiguous memory locations…
…in row-major order in C/C++
[RN]
Impl. Details (contd.): nD Array DS (as a
single 1D array)
In C, C++, Java, we use In Fortran, we use
row-major storage. column-major storage.
− All elements of a row are − each column is
stored together. stored together.
0,0 0,2
0,0 2,0
1,2
3,2
0,3 1,3 2,3
[RN]
Impl. Details (contd.): nD Array DS (as a
single 1D array in C/C++)
void fun(int a[ ][ ]) { We view an array to be a D-
a[0][0] = 20; dimensional matrix. However, for
} the hardware, it is simply single
void main() { dimensional.
int a[5][10];
fun(a);
printf("%d\n", a[0][0]);
}
ERROR: type of formal parameter 1 is incomplete
− (i * w3 * w2 * w1 + j * w2 * w1 + k * w1 + l) * 4
How to optimize the computation?
− Use Horner's rule: (((i * w3 + j) * w2 + k) * w1 + l) * 4 [RN]
Array Applications: Programming
Homework
Merge two sorted arrays
− In a third array
− In situ (later also check with linked lists)
For a given data, create a histogram
− Numbers of students in [0..10), [10, 20), ..., [90, 100].
Given two arrays of sizes N1 and N2, find a product matrix
(P[i][j] = A[i] * B[j]).
− Can this be done in O(N1 + N2) time?
− or O(N1 log N2)?
• Given an unsorted array of (positive and negative) integers, the
task is to find the smallest positive number missing from the
array (in O(N) time?).
[RN]
Matrix Appn.: 8-Queens Problem
(Homework)
Given a chess-board,
can you place 8 queens
in non-attacking positions?
(no two queens in the same row
or same column or same diagonal)
Does a solution exist for 2x2, 3x3, 4x4?
[RN]
Search in a Sorted Matrix[M][N]
3 5 9 20 39 Focus on 44.
Check where all values < 44 appear.
4 6 11 21 40 Check where all values > 44 appear.
7 10 12 23 45
Classwork: Devise a method
8 13 22 27 46
to search for an element in this
19 29 41 43 49 matrix.
24 30 44 50 52
25 31 47 51 55
28 33 48 53 61
32 42 54 56 66
35 57 60 62 69
For now, let’s assume that all values are unique.
[RN]
Search in a Sorted Matrix[M][N]
Approach 1: Divide and Conquer
− < i, 0 and < 0, j → Q1
− < i, 0 and > 0, j → Q1, Q2
− > i, 0 and < 0, j → Q1, Q4
Q1 j Q2
− > i, 0 and > 0, j → Q1, Q2, Q3, Q4
max(M,N))
Approach 3: Elimination
− Consider e: [0, N-1].
− If key == e, found the element i x y
Source: wikipedia
[RN]
Outline for Module M4
• M4 Array
• M4.1 Array ADT vs. DS (incl. Matrix and Applications)
[RN]
Selection Sort
Approach: Choose the minimum element, and
push it to its final place.
What is the invariant?
− First i elements are at their final places after i
iterations.
Homework: Write for
the(ii =code.
0 ; ii < N - 1; ++ii) {
int iimin = ii;
Source: heap-sort.cpp
[RN]
Quicksort
Approach:
− Choose an arbitrary element (called pivot).
− Place the pivot at its final place.
− Make sure all the elements smaller than the pivot
are to the left of it, and ... (called partitioning)
− Divide-and-conquer.
void quick(int start, int end) {
if (start < end) { Crucially decides
int iipivot = partition(start, end); the complexity.
quick(start, iipivot - 1);
quick(iipivot + 1, end);
}
} [RN]
Merge Sort
Divide-and-Conquer
− Divide the array into two halves
− Sort each array separately
− Merge the two sorted sequences
Worst case complexity: O(n log n)
− Not efficient in practice due to array copying.
void mergeSort(int start, int end) {
Homework: Write the<code.
if (start end) {
int mid = (start + end) / 2;
mergeSort(start, mid);
mergeSort(mid + 1, end);
merge(start, mid, end);
} [RN]
Outline for Module M4
• M4 Array
• M4.1 Array ADT vs. DS (incl. Matrix and Applications)
[from WeissBook]
The class of comparison-based
sorting algorithms – running time lower bound
Array consists of n distinct elements.
Number of orderings/permutations = n!
A sorting algorithm must distinguish between these permutations.
The number of yes/no qns. necessary to distinguish n! permutations
is log(n!).
− Also called information theoretic lower bound
Given: N! >= (n/2)n/2
log(N!) >= n/2 log(n/2) which is Ω (n log n)
Comparison-based sort needs 1 qn. per comparison (two numbers).
Hence it must require at least n log n time.
− For each comparison-based sorting algorithm, there exists an input for which
it would take n log n comparisons.
− Heapsort, mergesort are theoretically asymptotically optimal (subject to
constants)
[RN]
Outline for Module M4
• M4 Array
• M4.1 Array ADT vs. DS (incl. Matrix and Applications)
[RN]
Counting Sort
Bucketize elements.
Find count of elements in each bucket.
Perform prefix sum.
Copy elements from buckets to original array.
Original array 4 1 4 9 11 7 5 1 3 4
Buckets 1, 1 3 4, 4, 4 5 7 9 11
Bucket sizes 2 0 1 3 1 0 1 0 1 0 1
Starting index 0 2 2 3 6 7 7 8 8 9 9
Output array 1 1 3 4 4 4 7 8 9 11
[RN]
Radix Sort O(P * (N + B))
P = passes
Generalization of bucket sort. N = elements
B = buckets
Radix sort sorts using different digits.
At every step, elements are moved to buckets
based on their ith digits, starting from the least
significant digit.
Homework 1: 33, 453, 124, 225, 1023, 432, 2232
Homework 2: bat, gym, cat, rat, dim, cub
64 8 216 512 27 729 0 1 343 125
0 1 512 343 64 125 216 27 8 729
00, 01, 512, 125, 343 64
08 216 27,
729
000, 125 216 343 512 729
001,
008,
027,
064 [RN]
Merge vs. Radix Sort: An exercise
for the road!
You are given a set of 𝑚𝑚 strings, with each string being of
maximum length 𝑘𝑘. These strings are words from the English
alphabet Σ with Σ = 26.
• What is the running time of sorting these strings using merge sort?
(Beware: Each comparison is not O(1).)
• What is the running time of sorting these strings using radix sort?
(Note: How many buckets are being used here? How many rounds/passes?)
• Express above running times in terms of 𝑘𝑘, 𝑚𝑚 and |Σ|, so that you
can answer the following questions:
• Which of the above two algorithms is better for English alphabet?
• Which of the two algorithms is better if your language has a very large
alphabet, i.e., will your answer change if |Σ| is not a constant?