UnitIVofDSpdf 2024 11 23 23 52 16
UnitIVofDSpdf 2024 11 23 23 52 16
A Graph is a non-linear data structure consisting of vertices and edges. The vertices are
sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes
in the graph. More formally a Graph is composed of a set of vertices( V ) and a set of edges( E
). The graph is denoted by G(V, E).
# Graph Representations
In graph theory, a graph representation is a technique to store graph into the memory of
computer.
To represent a graph, we just need the set of vertices, and for each vertex the neighbors of the
vertex (vertices which is directly connected to it by an edge). If it is a weighted graph, then the
weight will be associated with each edge.
There are different ways to optimally represent a graph, depending on the density of its edges,
type of operations to be performed and ease of use.
1. Adjacency Matrix
• Adjacency matrix is a sequential representation.
• It is used to represent which nodes are adjacent to each other. i.e. is there any edge
connecting nodes to a graph.
• In this representation, we have to construct a nXn matrix A. If there is any edge from a
vertex i to vertex j, then the corresponding element of A, ai,j = 1, otherwise ai,j= 0.
• If there is any weighted graph then instead of 1s and 0s, we can store the weight of the
edge.
Example
Pros:
• Adjacency list saves lot of space.
• We can easily insert or delete as we use linked list.
• Such kind of representation is easy to follow and clearly shows the adjacent nodes of
node.
Cons:
The adjacency list allows testing whether two vertices are adjacent to each other but it is slower
to support this operation.
# Graphs Traversal
To traverse a Graph means to start in one vertex, and go along the edges to visit other vertices
until all vertices, or as many as possible, have been visited.
The two most common ways a Graph can be traversed are:
• Depth First Search (DFS)
• Breadth First Search (BFS)
DFS is usually implemented using a Stack or by the use of recursion (which utilizes the call
stack), while BFS is usually implemented using a Queue.
Note: The Call Stack keeps functions running in the correct order. If for example FunctionA
calls FunctionB, FunctionB is placed on top of the call stack and starts running. Once
FunctionB is finished, it is removed from the stack, and then FunctionA resumes its work.
In the above graph, minimum path 'P' can be found by using the BFS that will start from Node
A and end at Node E. The algorithm uses two queues, namely QUEUE1 and QUEUE2.
QUEUE1 holds all the nodes that are to be processed, while QUEUE2 holds all the nodes that
are processed and deleted from QUEUE1.
Now, let's start examining the graph starting from Node A.
Step 1 - First, add A to queue1 and NULL to queue2.
QUEUE1 = {A}
QUEUE2 = {NULL}
Step 2 - Now, delete node A from queue1 and add it into queue2. Insert all neighbors of node
A to queue1.
QUEUE1 = {B, D}
QUEUE2 = {A}
Step 3 - Now, delete node B from queue1 and add it into queue2. Insert all neighbors of node
B to queue1.
QUEUE1 = {D, C, F}
QUEUE2 = {A, B}
Step 4 - Now, delete node D from queue1 and add it into queue2. Insert all neighbors of node
D to queue1. The only neighbor of Node D is F since it is already inserted, so it will not be
inserted again.
QUEUE1 = {C, F}
QUEUE2 = {A, B, D}
Step 5 - Delete node C from queue1 and add it into queue2. Insert all neighbors of node C to
queue1.
QUEUE1 = {F, E, G}
QUEUE2 = {A, B, D, C}
Step 6 - Delete node F from queue1 and add it into queue2. Insert all neighbors of node F to
queue1. Since all the neighbors of node F are already present, we will not insert them again.
QUEUE1 = {E, G}
QUEUE2 = {A, B, D, C, F}
Step 7- Delete node E from queue1. Since all of its neighbors have already been added, so we
will not insert them again. Now, all the nodes are visited, and the target node E is encountered
into queue2.
1. QUEUE1 = {G}
2. QUEUE2 = {A, B, D, C, F, E}
Complexity of BFS algorithm
Time complexity of BFS depends upon the data structure used to represent the graph. The time
complexity of BFS algorithm is O(V+E), since in the worst case, BFS algorithm explores every
node and edge. In a graph, the number of vertices is O(V), whereas the number of edges is
O(E).
The space complexity of BFS can be expressed as O(V), where V is the number of vertices.
Depth First Search Traversal Algorithm
Depth First Search is said to go "deep" because it visits a vertex, then an adjacent vertex, and
then that vertex' adjacent vertex, and so on, and in this way the distance from the starting vertex
increases for each recursive iteration.
Because of the recursive nature, stack data structure can be used to implement the DFS
algorithm. The process of implementing the DFS is similar to the BFS algorithm.
The step by step process to implement the DFS traversal is given as follows -
1. First, create a stack with the total number of vertices in the graph.
2. Now, choose any vertex as the starting point of traversal, and push that vertex into the
stack.
3. After that, push a non-visited vertex (adjacent to the vertex on the top of the stack) to
the top of the stack.
4. Now, repeat steps 3 and 4 until no vertices are left to visit from the vertex on the stack's
top.
5. If no vertex is left, go back and pop a vertex from the stack.
6. Repeat steps 2, 3, and 4 until the stack is empty.
Applications of DFS algorithm
The applications of using the DFS algorithm are given as follows -
• DFS algorithm can be used to implement the topological sorting.
• It can be used to find the paths between two vertices.
• It can also be used to detect cycles in the graph.
• DFS algorithm is also used for one solution puzzles.
• DFS is used to determine if a graph is bipartite or not.
Algorithm
Step 1: SET STATUS = 1 (ready state) for each node in G
Step 2: Push the starting node A on the stack and set its STATUS = 2 (waiting state)
Step 3: Repeat Steps 4 and 5 until STACK is empty
Step 4: Pop the top node N. Process it and set its STATUS = 3 (processed state)
Step 5: Push on the stack all the neighbors of N that are in the ready state (whose STATUS =
1) and set their STATUS = 2 (waiting state)
[END OF LOOP]
Step 6: EXIT
Example of DFS algorithm
Now, let's understand the working of the DFS algorithm by using an example. In the example
given below, there is a directed graph having 7 vertices.
Now, let's start examining the graph starting from Node H.
Step 1 - First, push H onto the stack.
STACK: H
Step 2 - POP the top element from the stack, i.e., H, and print it. Now, PUSH all the neighbors
of H onto the stack that are in ready state.
Print: H]STACK: A
Step 3 - POP the top element from the stack, i.e., A, and print it. Now, PUSH all the neighbors
of A onto the stack that are in ready state.
Print: A
STACK: B, D
Step 4 - POP the top element from the stack, i.e., D, and print it. Now, PUSH all the neighbors
of D onto the stack that are in ready state.
Print: D
STACK: B, F
Step 5 - POP the top element from the stack, i.e., F, and print it. Now, PUSH all the neighbors
of F onto the stack that are in ready state.
Print: F
STACK: B
Step 6 - POP the top element from the stack, i.e., B, and print it. Now, PUSH all the neighbors
of B onto the stack that are in ready state.
Print: B
STACK: C
Step 7 - POP the top element from the stack, i.e., C, and print it. Now, PUSH all the neighbors
of C onto the stack that are in ready state.
Print: C
STACK: E, G
Step 8 - POP the top element from the stack, i.e., G and PUSH all the neighbors of G onto the
stack that are in ready state.
Print: G
STACK: E
Step 9 - POP the top element from the stack, i.e., E and PUSH all the neighbors of E onto the
stack that are in ready state.
Print: E
STACK:
Now, all the graph nodes have been traversed, and the stack is empty.
Complexity of Depth-first search algorithm
The time complexity of the DFS algorithm is O(V+E), where V is the number of vertices and
E is the number of edges in the graph.
The space complexity of the DFS algorithm is O(V).
# Shortest Path Algorithms
Dijkstra’s Algorithm
Dijkstra’s Algorithm is a widely used and efficient algorithm for finding the shortest path
between nodes in a weighted graph. It works by maintaining a set of tentative distances to every
node and iteratively updating these distances based on the actual distances found.
This algorithm was created and published by Dr. Edsger W. Dijkstra, a brilliant Dutch computer
scientist and software engineer.
In 1959, he published a 3-page article titled "A note on two problems in connexion with graphs"
where he explained his new algorithm.
Algorithm:
1. Initialize a distance table with tentative distances to all nodes, setting the space to the starting
node as 0 and all others as infinity.
2. Set the starting node as the current node.
3. For each neighbor of the current node, calculate its tentative distance and update the distance
table if a shorter path is found.
4. Mark the current node as visited.
5. Select the unvisited node with the smallest tentative distance, set it as the new current node,
and repeat steps 3–5.
6. Continue this process until the destination node is marked as visited or until all nodes have
been called.
Applications and Use Cases
· Navigation and Maps: Dijkstra’s Algorithm is used in GPS systems to find the shortest route
between two locations.
· Networking and Routing: It’s employed in determining the optimal path for data
transmission in computer networks.
· Transportation and Logistics: Dijkstra’s Algorithm helps optimize routes for delivery
services and public transportation.
The algorithm will generate the shortest path from node 0 to all the other nodes in the graph.
Tip: For this graph, we will assume that the weight of the edges represents the distance
between two nodes.
We will have the shortest path from node 0 to node 1, from node 0 to node 2, from node 0 to
node 3, and so on for every node in the graph.
Initially, we have this list of distances (please see the list below):
• The distance from the source node to itself is 0. For this example, the source node will
be node 0 but it can be any node that you choose.
• The distance from the source node to all other nodes has not been determined yet, so
we use the infinity symbol to represent this initially.
We also have this list (see below) to keep track of the nodes that have not been visited yet
(nodes that have not been included in the path):
Tip: Remember that the algorithm is completed once all nodes have been added to the path.
Since we are choosing to start at node 0, we can mark this node as visited. Equivalently, we
cross it off from the list of unvisited nodes and add a red border to the corresponding node in
diagram:
Now we need to start checking the distance from node 0 to its adjacent nodes. As you can see,
these are nodes 1 and 2 (see the red edges):
Tip: This doesn't mean that we are immediately adding the two adjacent nodes to the
shortest path. Before adding a node to this path, we need to check if we have found the shortest
path to reach it. We are simply making an initial examination process to see the options
available.
We need to update the distances from node 0 to node 1 and node 2 with the weights of the edges
that connect them to node 0 (the source node). These weights are 2 and 6, respectively:
Now we need to analyze the new adjacent nodes to find the shortest path to reach them. We
will only analyze the nodes that are adjacent to the nodes that are already part of the shortest
path (the path marked with red edges).
Node 3 and node 2 are both adjacent to nodes that are already in the path because they are
directly connected to node 1 and node 0, respectively, as you can see below. These are the nodes
that we will analyze in the next step.
Since we already have the distance from the source node to node 2 written down in our list, we
don't need to update the distance this time. We only need to update the distance from the source
node to the new adjacent node (node 3):
Now that we have the distance to the adjacent nodes, we have to choose which node will be
added to the path. We must select the unvisited node with the shortest (currently known)
distance to the source node.
From the list of distances, we can immediately detect that this is node 2 with distance 6:
We add it to the path graphically with a red border around the node and a red edge:
We also mark it as visited by adding a small red square in the list of distances and crossing it
off from the list of unvisited nodes:
Now we need to repeat the process to find the shortest path from the source node to the new
adjacent node, which is node 3.
You can see that we have two possible paths 0 -> 1 -> 3 or 0 -> 2 -> 3. Let's see how we can
decide which one is the shortest path.
Node 3 already has a distance in the list that was recorded previously (7, see the list below).
This distance was the result of a previous step, where we added the weights 5 and 2 of the two
edges that we needed to cross to follow the path 0 -> 1 -> 3.
But now we have another alternative. If we choose to follow the path 0 -> 2 -> 3, we would
need to follow two edges 0 -> 2 and 2 -> 3 with weights 6 and 8, respectively, which represents
a total distance of 14.
Clearly, the first (existing) distance is shorter (7 vs. 14), so we will choose to keep the original
path 0 -> 1 -> 3. We only update the distance if the new path is shorter.
Therefore, we add this node to the path using the first alternative: 0 -> 1 -> 3.
We mark this node as visited and cross it off from the list of unvisited nodes:
Now we repeat the process again.
We need to check the new adjacent nodes that we have not visited so far. This time, these nodes
are node 4 and node 5 since they are adjacent to node 3.
We update the distances of these nodes to the source node, always trying to find a shorter path,
if possible:
• For node 4: the distance is 17 from the path 0 -> 1 -> 3 -> 4.
• For node 5: the distance is 22 from the path 0 -> 1 -> 3 -> 5.
Tip: Notice that we can only consider extending the shortest path (marked in red). We
cannot consider paths that will take us through edges that have not been added to the shortest
path (for example, we cannot form a path that goes through the edge 2 -> 3).
We need to choose which unvisited node will be marked as visited now. In this case, it's node
4 because it has the shortest distance in the list of distances. We add it graphically in the
diagram:
For node 5:
• The first option is to follow the path 0 -> 1 -> 3 -> 5, which has a distance of 22 from
the source node (2 + 5 + 15). This distance was already recorded in the list of distances
in a previous step.
• The second option would be to follow the path 0 -> 1 -> 3 -> 4 -> 5, which has a distance
of 23 from the source node (2 + 5 + 10 + 6).
Clearly, the first path is shorter, so we choose it for node 5.
For node 6:
• The path available is 0 -> 1 -> 3 -> 4 -> 6, which has a distance of 19 from the source
node (2 + 5 + 10 + 2).
We mark the node with the shortest (currently known) distance as visited. In this case, node 6.
And we cross it off from the list of unvisited nodes:
Only one node has not been visited yet, node 5. Let's see how we can include it in the path.
There are three different paths that we can take to reach node 5 from the nodes that have been
added to the path:
• Option 1: 0 -> 1 -> 3 -> 5 with a distance of 22 (2 + 5 + 15).
• Option 2: 0 -> 1 -> 3 -> 4 -> 5 with a distance of 23 (2 + 5 + 10 + 6).
• Option 3: 0 -> 1 -> 3 -> 4 -> 6 -> 5 with a distance of 25 (2 + 5 + 10 + 2 + 6).
We select the shortest path: 0 -> 1 -> 3 -> 5 with a distance of 22.
We mark the node as visited and cross it off from the list of unvisited nodes:
And voilà! We have the final result with the shortest path from node 0 to each node in the
graph.
In the diagram, the red lines mark the edges that belong to the shortest path. You need to follow
these edges to follow the shortest path to reach a given node in the graph starting from node 0.
For example, if you want to reach node 6 starting from node 0, you just need to follow the red
edges and you will be following the shortest path 0 -> 1 -> 3 -> 4 - > 6 automatically.
# Searching
Searching is the fundamental process of locating a specific element or item within a collection
of data. This collection of data can take various forms, such as arrays, lists, trees, or other
structured representations.
Importance of Searching in DSA
• Efficiency: Efficient searching algorithms improve program performance.
• Data Retrieval: Quickly find and retrieve specific data from large datasets.
• Database Systems: Enables fast querying of databases.
• Problem Solving: Used in a wide range of problem-solving tasks.
Searching Algorithms:
1. Linear Search
2. Binary Search
3. Ternary Search
4. Jump Search
5. Interpolation Search
6. Fibonacci Search
7. Exponential Search
1) Linear Search
Linear search is a type of sequential searching algorithm. In this method, every element within
the input array is traversed and compared with the key element to be found. If a match is found
in the array the search is said to be successful; if there is no match found the search is said to
be unsuccessful and gives the worst-case time complexity.
Linear Search Algorithm
The algorithm for linear search is relatively simple. The procedure starts at the very first index
of the input array to be searched.
Step 1 − Start from the 0th index of the input array, compare the key value with the value
present in the 0th index.
Step 2 − If the value matches with the key, return the position at which the value was found.
Step 3 − If the value does not match with the key, compare the next element in the array.
Step 4 − Repeat Step 3 until there is a match found. Return the position at which the match
was found.
Step 5 − If it is an unsuccessful search, print that the element is not present in the array and
exit the program.
Pseudocode
procedure linear_search (list, value)
for each item in the list
if match item == value
return the item's location
end if
end for
end procedure
Analysis
Linear search traverses through every element sequentially therefore, the best case is when the
element is found in the very first iteration. The best-case time complexity would be O(1).
However, the worst case of the linear search method would be an unsuccessful search that does
not find the key value in the array, it performs n iterations. Therefore, the worst-case time
complexity of the linear search algorithm would be O(n).
Example
Let us look at the step-by-step searching of the key element (say 47) in an array using the linear
search method.
Step 1
The linear search starts from the 0th index. Compare the key element with the value in the 0th
index, 34.
Step 4
Now the element in 3rd index, 27, is compared with the key value, 47. They are not equal so
the algorithm is pushed forward to check the next element.
Step 5
Comparing the element in the 4th index of the array, 47, to the key 47. It is figured that both the
elements match. Now, the position in which 47 is present, i.e., 4 is returned.
if A[midPoint] > x
set upperBound = midPoint - 1
if A[midPoint] = x
EXIT: x found at location midPoint
end while
end procedure
Example
For a binary search to work, it is mandatory for the target array to be sorted. We shall learn the
process of binary search with a pictorial example. The following is our sorted array and let us
assume that we need to search the location of value 31 using binary search.
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We find
that the value at location 4 is 27, which is not a match. As the value is greater than 27 and we
have a sorted array, so we also know that the target value must be in the upper portion of the
array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.
The value stored at location 7 is not a match, rather it is less than what we are looking for. So,
the value must be in the lower part from this location.
Divide: In Divide, first pick a pivot element. After that, partition or rearrange the array into
two sub-arrays such that each element in the left sub-array is less than or equal to the pivot
element and each element in the right sub-array is larger than the pivot element.
Quicksort picks an element as pivot, and then it partitions the given array around the picked
pivot element. In quick sort, a large array is divided into two arrays in which one holds values
that are smaller than the specified value (Pivot), and another array holds the values that are
greater than the pivot.
After that, left and right sub-arrays are also partitioned using the same approach. It will
continue until the single element remains in the sub-array.
Picking a good pivot is necessary for the fast implementation of quicksort. However, it is
typical to determine a good pivot. Some of the ways of choosing a pivot are as follows -
• Pivot can be random, i.e. select the random pivot from the given array.
• Pivot can either be the rightmost element of the leftmost element of the given array.
• Select median as the pivot element.
Algorithm
Algorithm:
{
1 if (start < end)
2{
6}
Partition Algorithm:
1 pivot ? A[end]
2 i ? start-1
5 then i ? i + 1
7 }}
9 return i+1
To understand the working of quick sort, let's take an unsorted array. It will make the concept
more clear and understandable.
Let the elements of array are -
In the given array, we consider the leftmost element as pivot. So, in this case, a[left] = 24,
a[right] = 27 and a[pivot] = 24.
Since, pivot is at left, so algorithm starts from right and move towards left.
Now, a[pivot] < a[right], so algorithm moves forward one position towards left, i.e. -
Because, a[pivot] > a[right], so, algorithm will swap a[pivot] with a[right], and pivot moves
to right, as -
Now, a[left] = 19, a[right] = 24, and a[pivot] = 24. Since, pivot is at right, so algorithm starts
from left and moves to right.
As a[pivot] > a[left], so algorithm moves one position to right as -
Now, a[left] = 9, a[right] = 24, and a[pivot] = 24. As a[pivot] > a[left], so algorithm moves one
position to right as -
Now, a[left] = 29, a[right] = 24, and a[pivot] = 24. As a[pivot] < a[left], so, swap a[pivot] and
a[left], now pivot is at left, i.e. -
Since, pivot is at left, so algorithm starts from right, and move to left. Now, a[left] = 24, a[right]
= 29, and a[pivot] = 24. As a[pivot] < a[right], so algorithm moves one position to left, as -
Now, a[pivot] = 24, a[left] = 24, and a[right] = 14. As a[pivot] > a[right], so, swap a[pivot] and
a[right], now pivot is at right, i.e. -
Now, a[pivot] = 24, a[left] = 14, and a[right] = 24. Pivot is at right, so the algorithm starts from
left and move to right.
Now, a[pivot] = 24, a[left] = 24, and a[right] = 24. So, pivot, left and right are pointing the
same element. It represents the termination of procedure.
Element 24, which is the pivot element is placed at its exact position.
Elements that are right side of element 24 are greater than it, and the elements that are left side
of element 24 are smaller than it.
Now, in a similar manner, quick sort algorithm is separately applied to the left and right sub-
arrays. After sorting gets done, the array will be -
Quicksort complexity
1. Time Complexity
Case Time Complexity
Here, 32 is greater than 13 (32 > 13), so it is already sorted. Now, compare 32 with 26.
Here, 26 is smaller than 36. So, swapping is required. After swapping new array will look like
-
Here, 35 is greater than 32. So, there is no swapping required as they are already sorted.
Now, the comparison will be in between 35 and 10.
Here, 10 is smaller than 35 that are not sorted. So, swapping is required. Now, we reach at the
end of the array. After first pass, the array will be -
Here, 10 is smaller than 26. So, swapping is required. After swapping, the array will be -
2. Space Complexity
According to the merge sort, first divide the given array into two equal halves. Merge sort
keeps dividing the list into equal parts until it cannot be further divided.
As there are eight elements in the given array, so it is divided into two arrays of size 4.
Now, again divide these two arrays into halves. As they are of size 4, so divide them into new
arrays of size 2.
Now, again divide these arrays to get the atomic value that cannot be further divided.
In the next iteration of combining, now compare the arrays with two data values and merge
them into an array of found values in sorted order.
Now, there is a final merging of the arrays. After the final merging of above arrays, the array
will look like -
2. Space Complexity
Now, for the first position in the sorted array, the entire array is to be scanned sequentially.
At present, 12 is stored at the first position, after searching the entire array, it is found that 8 is
the smallest value.
So, swap 12 with 8. After the first iteration, 8 will appear at the first position in the sorted
array.
For the second position, where 29 is stored presently, we again sequentially scan the rest of
the items of unsorted array. After scanning, we find that 12 is the second lowest element in the
array that should be appeared at second position.
Now, swap 29 with 12. After the second iteration, 12 will appear at the second position in the
sorted array. So, after two iterations, the two smallest values are placed at the beginning in a
sorted way.
The same process is applied to the rest of the array elements. Now, we are showing a pictorial
representation of the entire sorting process.
2. Space Complexity
# Hashing
Hashing is the technique/ process of mapping key: value pairs by calculating a Hash code
using the Hash Function. When given a (key: value) pair, the Hash Function calculates a small
integer value from the key. The obtained integer is called the Hash value/ Hash code and acts
as the index to store the corresponding value inside the Hash Table.
If for two (key: value) pairs, the same index is obtained after applying the Hash Function, this
condition is called Collision. We need to choose a Hash Function such that Collision doesn't
occur.
Terminology:
1. Hashing: The whole process
2. Hash value/ code: The index in the Hash Table for storing the value obtained after
computing the Hash Function on the corresponding key.
3. Hash Table: The data structure associated with hashing in which keys are mapped with
values stored in the array.
4. Hash Function/ Hash: The mathematical function to be applied on keys to obtain
indexes for their corresponding values into the Hash Table.
Types of Hash Functions in C
These are four Hash Functions we can choose based on the key being numeric or
alphanumeric:
1. Division Method
2. Mid Square Method
3. Folding Method
4. Multiplication Method
1. Division Method:
Say that we have a Hash Table of size 'S', and we want to store a (key, value) pair in the Hash
Table. The Hash Function, according to the Division method, would be:
H(key) = key mod M
• Here M is an integer value used for calculating the Hash value, and M should be greater
than S. Sometimes, S is used as M.
• This is the simplest and easiest method to obtain a Hash value.
• The best practice is using this method when M is a prime number, as we can distribute
all the keys uniformly.
• It is also fast as it requires only one computation - modulus.
Let us now take an example to understand the cons of this method:
Size of the Hash Table = 5 (M, S)
Key: Value pairs: {10: "Sudha", 11: "Venkat", 12: "Jeevani"}
For every pair:
1. {10: "Sudha"}
Key mod M = 10 mod 5 = 0
2. {11: "Venkat"}
Key mod M = 11 mod 5 = 1
3. {12: "Jeevani"}
Key mod M = 12 mod 5 = 2
Observe that the Hash values were consecutive. This is the disadvantage of this type of Hash
Function. We get consecutive indexes for consecutive keys, leading to poor performance due
to decreased security. Sometimes, we need to analyze many consequences while choosing the
Hash Table size.
2. Mid Square Method:
It is a two-step process of computing the Hash value. Given a {key: value} pair, the Hash
Function would be calculated by:
1. k*k, or square the value of k
2. Choose some digits from the middle of the number to obtain the Hash value.
Formula:
h(K) = h(k x k)
We should choose the number of digits to extract based on the size of the Hash Table. Suppose
the Hash Table size is 100; indexes will range from 0 to 99. Hence, we should select 2 digits
from the middle.
Suppose the size of the Hash Table is 10 and the key: value pairs are:
{10: "Sudha, 11: "Venkat", 12: "Jeevani"}
Number of digits to be selected: Indexes: (0 - 9), so 1
H(10) = 10 * 10 = 100 = 0
H(11) = 11 * 11 = 121 = 2
H(12) = 12 * 12 = 144 = 4
• All the digits in the key are utilized to contribute to the index, thus increasing the
performance of the Data Structure.
• If the key is a large value, squaring it further increases the value, which is considered
the con.
• Collisions might occur, too, but we can try to reduce or handle them.
• Another important point here is that, with the huge numbers, we need to take care of
overflow conditions. For suppose, if we take a 6-digit key, we get a 12-digit number
that exceeds the range of defined integers when we square it. We can use the long int
or string multiplication technique.
3. Folding Method
Given a {key: value} pair and the table size is 100 (0 - 99 indexes), the key is broken down
into 2 segments each except the last segment. The last segment can have less number of digits.
Now, the Hash Function would be:
1. H(x) = (sum of equal-sized segments) mod (size of the Hash Table)
The last carry with fewer digits can be ignored in calculating the Hash value.
For suppose "k" is a 10-digit key and the size of the table is 100(0 - 99), k is divided into:
4. Multiplication method
Unlike the three methods above, this method has more steps involved:
1. We must choose a constant between 0 and 1, say, A.
2. Multiply the key with the chosen A.
3. Now, take the fractional part from the product and multiply it by the table size.
4. The Hash will be the floor (only the integer part) of the above result.
So, the Hash Function under this method will be:
H(x) = floor(size(key*A mod 1))
For example:
• It is considered best practice to use the multiplication method when the Hash Table size
is a power of 2 as it makes the access and all the operations faster.
# Collision in Hashing
In this, the hash function is used to find the index of the array. The hash value is used to create
an index for the key in the hash table. The hash function may return the same hash value for
two or more keys. When two or more keys have the same hash value, a collision happens. To
handle this collision, we use collision resolution techniques.
Collision Resolution Techniques (CRT)
There are two types of collision resolution techniques.
• Separate chaining (open hashing)
• Open addressing (closed hashing)
Separate chaining: This method involves making a linked list out of the slot where the
collision happened, then adding the new key to the list. Separate chaining is the term used to
describe how this connected list of slots resembles a chain. It is more frequently utilized when
we are unsure of the number of keys to add or remove.
Time complexity
• Its worst-case complexity for searching is o(n).
• Its worst-case complexity for deletion is o(n).
Advantages of separate chaining
• It is easy to implement.
• The hash table never fills full, so we can add more elements to the chain.
• It is less sensitive to the function of the hashing.
Disadvantages of separate chaining
• In this, the cache performance of chaining is not good.
• Memory wastage is too much in this method.
• It requires more space for element links.
Open addressing: To prevent collisions in the hashing table, open addressing is employed as
a collision-resolution technique. No key is kept anywhere else besides the hash table. As a
result, the hash table’s size is never equal to or less than the number of keys. Additionally
known as closed hashing.
The following techniques are used in open addressing:
• Linear probing
• Quadratic probing
• Double hashing
Linear Probing
Linear probing is one of the forms of open addressing. As we know that each cell in the hash
table contains a key-value pair, so when the collision occurs by mapping a new key to the cell
already occupied by another key, then linear probing technique searches for the closest free
locations and adds a new key to that empty cell. In this case, searching is performed
sequentially, starting from the position where the collision occurs till the empty cell is not
found.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The calculated index
value of 11 is 5 which is already occupied by another key value, i.e., 6. When linear probing is
applied, the nearest empty cell to the index 5 is 6; therefore, the value 11 will be added at the
index 6.
The next key value is 13. The index value associated with this key value is 9 when hash function
is applied. The cell is already filled at index 9. When linear probing is applied, the nearest
empty cell to the index 9 is 0; therefore, the value 13 will be added at the index 0.
Quadratic Probing
It can also be defined as that it allows the insertion ki at first free location from (u+i2)%m
where i=0 to m-1.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively. We do not need to
apply the quadratic probing technique on these key values as there is no occurrence of the
collision.
The index value of 11 is 5, but this location is already occupied by the 6. So, we apply the
quadratic probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
The next element is 13. When the hash function is applied on 13, then the index value comes
out to be 9, which we already discussed in the chaining method. At index 9, the cell is occupied
by another value, i.e., 3. So, we will apply the quadratic probing technique to calculate the free
location.
When i=0
Index = (9+02)%10 = 9
When i=1
Index = (9+12)%10 = 0
The next element is 7. When the hash function is applied on 7, then the index value comes out
to be 7, which we already discussed in the chaining method. At index 7, the cell is occupied by
another value, i.e., 7. So, we will apply the quadratic probing technique to calculate the free
location.
When i=0
Index = (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
Since location 8 is empty, so the value 7 will be added at the index 8.
The next element is 12. When the hash function is applied on 12, then the index value comes
out to be 7. When we observe the hash table then we will get to know that the cell at index 7 is
already occupied by the value 2. So, we apply the Quadratic probing technique on 12 to
determine the free location.
When i=0
Index= (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
When i=2
Index = (7+22)%10 = 1
When i=3
Index = (7+32)%10 = 6
When i=4
Index = (7+42)%10 = 3
Since the location 3 is empty, so the value 12 would be stored at the index 3.
Double Hashing
Double hashing is an open addressing technique which is used to avoid the collisions. When
the collision occurs then this technique uses the secondary hash of the key. It uses one hash
value as an index to move forward until the empty location is found.
In double hashing, two hash functions are used. Suppose h1(k) is one of the hash functions used
to calculate the locations whereas h2(k) is another hash function. It can be defined as "insert ki
at first free place from (u+v*i)%m where i=(0 to m-1)". In this case, u is the location computed
using the hash function and v is equal to (h2(k)%m).
h1(k) = 2k+3
h2(k) = 3k+1
As we know that no collision would occur while inserting the keys (3, 2, 9, 6), so we will not
apply double hashing on these key values.
On inserting the key 11 in a hash table, collision will occur because the calculated index value
of 11 is 5 which is already occupied by some another value. Therefore, we will apply the double
hashing technique on key 11. When the key value is 11, the value of v is 4.
When i=0
Index = (5+4*0)%10 =5
When i=1
Index = (5+4*1)%10 = 9
When i=2
Index = (5+4*2)%10 = 3
Since the location 3 is empty in a hash table; therefore, the key 11 is added at the index 3.
The next element is 13. The calculated index value of 13 is 9 which is already occupied by
some another key value. So, we will use double hashing technique to find the free location.
The value of v is 0.
When i=0
Index = (9+0*0)%10 = 9
We will get 9 value in all the iterations from 0 to m-1 as the value of v is zero. Therefore, we
cannot insert 13 into a hash table.
The next element is 7. The calculated index value of 7 is 7 which is already occupied by some
another key value. So, we will use double hashing technique to find the free location. The value
of v is 2.
When i=0
Index = (7 + 2*0)%10 = 7
When i=1
Index = (7+2*1)%10 = 9
When i=2
Index = (7+2*2)%10 = 1
When i=3
Index = (7+2*3)%10 = 3
When i=4
Index = (7+2*4)%10 = 5
When i=5
Index = (7+2*5)%10 = 7
When i=6
Index = (7+2*6)%10 = 9
When i=7
Index = (7+2*7)%10 = 1
When i=8
Index = (7+2*8)%10 = 3
When i=9
Index = (7+2*9)%10 = 5
Since we checked all the cases of i (from 0 to 9), but we do not find suitable place to insert 7.
Therefore, key 7 cannot be inserted in a hash table.
The next element is 12. The calculated index value of 12 is 7 which is already occupied by
some another key value. So, we will use double hashing technique to find the free location.
The value of v is 7.
When i=0
Index = (7+7*0)%10 = 7
When i=1
Index = (7+7*1)%10 = 4
Since the location 4 is empty; therefore, the key 12 is inserted at the index 4.