Data Structures. MOD_5
Data Structures. MOD_5
● Time Complexity: This measures how the time to complete the sorting process grows
with the size of the input data. Common notations include:
1. Initialization: Start with the first element of the array as the minimum.
2. Finding the Minimum: Iterate through the unsorted portion of the array to find the
smallest element.
3. Swapping: Swap the found minimum element with the first element of the unsorted
portion.
4. Repeat: Move the boundary of the sorted and unsorted portions one element to the right
and repeat the process until the entire array is sorted.
● Worst Case: The time complexity of Selection Sort in the worst case is O(n^2), where n
is the number of elements in the array. This is because it requires two nested loops: one
for selecting the minimum and another for iterating through the unsorted elements.
● Small Data Sets: Selection Sort is simple and has low overhead, making it suitable for
small arrays.
● Memory Constraints: It is an in-place sorting algorithm, requiring only a constant
amount of additional memory space.
● Partially Sorted Data: If the data is already partially sorted, Selection Sort can perform
relatively well.
●
● Easy to Understand: The algorithm is straightforward and easy to implement, making it
a good choice for educational purposes and for those new to sorting algorithms.
● In-Place Sorting: Selection Sort requires only a constant amount of additional memory
space (O(1)), as it sorts the array without needing extra storage for another array.
● Fewer Swaps: It performs a minimal number of swaps compared to other algorithms,
which can be beneficial when the cost of swapping elements is high.
● Inefficiency: The average and worst-case time complexity is O(n^2), making it inefficient
for large datasets.
● Redundant Comparisons: Even if the array is sorted, it still goes through all the
comparisons, which can be wasteful.
Performance Comparison
● Compared to Selection Sort: Both have a time complexity of O(n^2), but Bubble Sort
generally performs more swaps, making it slower in practice.
● Compared to More Efficient Algorithms: Algorithms like Merge Sort and Quick Sort
have a time complexity of O(n log n), making them significantly faster for larger datasets.
1. Divide:
○ Continue merging until all sub-arrays are combined into a single sorted array.
● The time complexity of Merge Sort is O(n log n) for all cases (best, average, and worst).
This is because:
○ The array is divided in half (log n divisions).
○ Each level of division requires O(n) time to merge the elements.
● Efficiency: Merge Sort is efficient for large datasets due to its O(n log n) time
complexity, making it faster than O(n^2) algorithms like Bubble Sort and Selection Sort.
● Stability: Merge Sort is a stable sorting algorithm, meaning that it maintains the relative
order of equal elements, which can be important in certain applications.
● External Sorting: Merge Sort is particularly useful for sorting large datasets that do not
fit into memory (external sorting). It can efficiently sort data stored on disk by dividing the
data into manageable chunks, sorting them in memory, and then merging them.
Heap Sort is a comparison-based sorting algorithm that utilizes a binary heap data structure. It
works as follows:
1. Build a Max Heap:
○ Convert the input array into a binary heap, specifically a max heap, where the
parent node is greater than or equal to its child nodes.
2. Extract Elements:
○ Repeatedly remove the largest element (the root of the max heap) and place it at
the end of the array.
○ After removing the root, restore the heap property by re-heapifying the remaining
elements.
3. Repeat:
● The time complexity of Heap Sort is O(n log n) for all cases (best, average, and worst).
This is because:
○ Building the max heap takes O(n) time.
○ Each extraction of the maximum element requires O(log n) time, and this is done
n times.
● In-Place Sorting: Heap Sort requires only a constant amount of additional memory
(O(1)), making it space-efficient.
● No Worst-Case Scenarios: Unlike some algorithms (e.g., Quick Sort), Heap Sort has a
consistent O(n log n) time complexity across all cases, making it predictable in
performance.
● Not Recursive: Heap Sort can be implemented iteratively, which can be beneficial in
environments with limited stack space.
Quick Sort is a highly efficient sorting algorithm that uses a divide-and-conquer approach. It
works as follows:
1. Choose a Pivot: Select an element from the array as the pivot. Various strategies can
be used for choosing the pivot (e.g., first element, last element, median).
2. Partitioning:
○ Rearrange the array so that all elements less than the pivot are on its left, and all
elements greater than the pivot are on its right.
○ The pivot is now in its correct position in the sorted array.
3. Recursion:
○ Recursively apply the same process to the left and right sub-arrays created by
the partitioning step.
● The average-case time complexity of Quick Sort is O(n log n). This is because:
○ The array is divided into two roughly equal halves (log n divisions).
○ Each level of division requires O(n) time to perform the partitioning.
● Sorting Integers: Radix Sort is particularly effective for sorting large sets of integers or
fixed-length strings, especially when the range of values is not excessively large.
● Fixed-Length Data: It works well when the data has a fixed length, such as sorting
dates or phone numbers.
● Large Datasets: Radix Sort can outperform comparison-based algorithms when dealing
with large datasets, especially when the number of digits (d) is relatively small compared
to the number of elements (n).
Insertion Sort is a simple sorting algorithm that builds a sorted array one element at a time. It
works as follows:
1. Start with the Second Element: Assume the first element is already sorted. Begin with
the second element of the array.
○ Compare the current element (key) with the elements in the sorted portion (to its
left).
○ Shift all larger elements one position to the right to make space for the key.
○ Insert the key in its correct position.
3. Repeat: Continue this process for each element in the array until the entire array is
sorted.
● The best-case time complexity of Insertion Sort is O(n). This occurs when the array is
already sorted, as each element only needs to be compared once with the previous
element.
● Small Datasets: Insertion Sort is efficient for small datasets due to its low overhead and
simplicity.
● Nearly Sorted Data: It performs well when the data is nearly sorted, as it requires fewer
comparisons and shifts.
● Stable Sorting: Insertion Sort is a stable sorting algorithm, making it suitable for
applications where the relative order of equal elements must be preserved.
● Online Sorting: It can sort a list as it receives it, making it useful in scenarios where
data is continuously being added.
● Hashing: Hashing is a technique used to convert data (such as a key) into a fixed-size
numerical value, known as a hash code or hash value. This process allows for efficient
data retrieval and storage.
● Purpose: Hashing is primarily used in data structures like hash tables to enable fast
data access. It allows for:
○ Constant Time Complexity: Ideally, hash tables provide O(1) average time
complexity for search, insert, and delete operations.
○ Efficient Data Management: Hashing helps manage large datasets by
distributing data uniformly across a fixed-size array, minimizing collisions.
● Input: A hash function takes an input (or key) and processes it to produce a hash value.
● Process:
○ The hash function applies mathematical operations (such as modular arithmetic)
to the input data.
○ It generates a hash code that corresponds to an index in the hash table.
● Output: The output is a fixed-size integer that represents the original input.
● Deterministic: The same input should always produce the same hash value.
● Uniform Distribution: It should distribute hash values uniformly across the hash table to
minimize collisions.
● Efficient: The computation of the hash value should be quick and require minimal
resources.
1. Array:
○ The primary structure is an array that holds the hash values (or entries). Each
index in the array corresponds to a potential hash value.
2. Hash Function:
○ A hash function is used to compute the index for each key. It takes the key as
input and returns an integer that represents the index in the array.
3. Buckets:
○ Each index in the array can hold one or more entries (key-value pairs). This is
particularly important for handling collisions.
Collisions occur when two different keys hash to the same index. Here are some common
methods to handle collisions:
1. Chaining:
○ Each index in the hash table points to a linked list (or another data structure) that
holds all entries that hash to that index.
○ When a collision occurs, the new entry is simply added to the linked list at that
index.
2. Open Addressing:
○ In this method, when a collision occurs, the algorithm searches for the next
available index in the array.
○ Common techniques for open addressing include:
■ Linear Probing: Check the next index sequentially until an empty slot is
found.
■ Quadratic Probing: Check indices at increasing intervals (e.g., 1, 4, 9,
etc.) to find an empty slot.
■ Double Hashing: Use a second hash function to determine the step size
for probing.
3. Cuckoo Hashing:
○ This method uses two hash functions and two hash tables. When a collision
occurs, the existing entry is "kicked out" and reinserted into its alternative position
based on the second hash function.
Each method has its advantages and trade-offs, and the choice of collision resolution strategy
can impact the performance of the hash table.
○ The same input should always produce the same hash value, ensuring
consistency in data retrieval.
2. Uniform Distribution:
○ A good hash function should distribute hash values uniformly across the hash
table. This minimizes clustering and reduces the likelihood of collisions.
3. Efficient Computation:
○ While collisions are inevitable, a good hash function should minimize their
occurrence by producing unique hash values for different inputs as much as
possible.
5. Sensitivity to Input Changes:
○ A poorly designed hash function can lead to a high collision rate, which can
degrade the performance of the hash table. More collisions mean longer search
times and increased complexity in handling them.
2. Load Factor:
○ The load factor (the ratio of the number of entries to the number of slots in the
hash table) is influenced by the hash function. A good hash function helps
maintain a lower load factor, which improves performance.
3. Access Time:
○ Hashing helps identify duplicate data by generating hash values for data blocks.
If two blocks produce the same hash value, they are likely duplicates, allowing for
efficient storage management.
4. Cryptography:
○ Hash tables, which utilize hashing, are widely used in programming for
implementing associative arrays, sets, and dictionaries, allowing for fast data
access.
○ In hash tables, keys are hashed to produce an index where the corresponding
value is stored. This allows for O(1) average time complexity for data retrieval,
insertion, and deletion.
2. Direct Address Tables:
○ Hashing can be used to create direct address tables where each key directly
maps to an index in an array, enabling constant-time access.
3. Bloom Filters:
○ Some file systems use hashing to manage file storage and retrieval, ensuring
quick access to files based on their names or contents.
1. Search:
○ The load factor (denoted as α) is defined as the ratio of the number of entries (n)
to the number of slots (m) in the hash table: α = n/m.
● Impact on Performance:
○ To maintain performance, hash tables often resize (rehash) when the load factor
exceeds a certain threshold (commonly around 0.7). This involves creating a
larger array and recalculating the hash values for existing entries, which can be
costly but helps restore efficient performance.