DSA Lab 11 Hashing
DSA Lab 11 Hashing
Session 11
Course: Data Structures (CL2001) Semester: Fall 2024
Instructor: Alishba Subhani T.A:
HASHING
Hashing refers to the process of generating a fixed-size output from an input of variable size
using the mathematical formulas known as hash functions. This technique determines an
index or location for the storage of an item in a data structure.
Components Of Hashing:
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash function
the technique that determines an index or location for storage of an item in a data
structure.
2. Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index .
3. Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.
1|Page
Inserting In A Hash Table:
1. Choose a Hash Function: The first step is selecting or designing a hash function suitable
for the data and the hash table size. The function should map input keys to indices
within the range of the hash table size, ensuring uniform distribution.
H(key) = key % sizeOfHashTable
The hash index must be within the range of hashtable size, so the key is usually taken
modulo the table size to produce a valid index.
2. Calculate Hash Code: For a given key, apply the hash function to generate an index
where that key will be inserted in the hashtable.
3. Insert Data:
i) Calculate the index using the hash function.
ii) Check if the computed index in the hash table is empty
o If it’s empty, place the key at that index.
o If it’s occupied (collision occurs), resolve the collision using a chosen
method:
▪ Separate Chaining: Add the key-value pair to a linked list at that index.
▪ Open Addressing: Probe for the next available slot based on the probing
technique (e.g., linear, quadratic, or double hashing).
2|Page
COLLISION RESOLUTION
1. SEPARATE CHAINING: This method involves making a linked list out of the slot
where the collision happened, then adding the new key to the list.
o Time complexity: Its worst-case complexity for searching and deletion is o(n).
o The hash table never fills full, so we can add more elements to the chain.
o It requires more space for element links.
3|Page
Code For Separate Chaining Using Vectors of Vectors:
4|Page
2. OPEN ADDRESSING: To
prevent collisions in the
hashing table, open
addressing is employed as
a collision-resolution
technique. No key is kept
anywhere else besides the
hash table. As a result, the
hash table’s size is never
equal to or less than the
number of keys. (Note
that we can increase table
size by copying old data
if needed). Additionally
known as closed hashing.
5|Page
b) Quadratic probing: When a collision occurs at index i, the next slots checked
are
(i + 1^2) % table_size,
(i + 2^2) % table_size,
(i + 3^2) % table_size, and so on.
This spreads out the potential positions, reducing clustering but requiring a
well-sized table to ensure all slots can be reached.
6|Page
c) Double hashing: Double hashing uses a second hash function to calculate the
step size for probing.When a collision occurs at index i, the next slot is
determined by (i + j * hash2(key)) % table_size, where hash2 is a
secondary hash function, and j increments with each probe. Double hashing
generally provides a good spread across the table and minimizes clustering.
REHASHING
Rehashing is the process of resizing a hash table and reassigning all the elements to new
positions within it. This is done to reduce the load factor, minimize collisions, and improve
the performance of hash operations. In essence, rehashing involves creating a new, larger
hash table and re-inserting each key-value pair from the old table using a new hash function
or the same one adjusted to the new table size.
Rehashing is typically triggered when the load factor of the hash table reaches or exceeds a
certain threshold, usually around 0.7 to 0.75. The load factor is defined as:
7|Page
A high load factor means there are more elements relative to the number of slots, leading to a
higher probability of collisions and therefore longer search times. Rehashing alleviates this by
expanding the table size and redistributing elements.
8|Page
EXERCISES:
1. Design a library catalog system where each book is assigned a unique ID. To store and retrieve
book information efficiently, the system uses a hash table. However, due to limited storage
slots, books of the same authors map to the same index create a mechanism to handle
overlapping book IDs effectively.
Each book ID is a 3-digit number with 1st two numbers representing the book author and the last
digit is the book ID specific to that author.
a. Create a hash table of size 10 and insert 9 records (3 for author A, 2 for author B, 4 for
author C).
b. Search for 2 of the author’s books inserted and 1 book that is not on the table.
c. Delete the 2 books from part b.
Display the hash table after each operation.
2. A fitness club stores its member IDs on a fixed-size table for quick access. Each unique member
ID is mapped to a position in the table using a hash function. Due to limited storage, the table
cannot have gaps left unused for long, so if a position is occupied, the system must look for the
next available slot for the new member ID.
a. Create a hash table of size 7 and insert member IDs 10 - 60.
b. Search for member IDs: 30, 50, 70.
c. Delete member IDs: 20 and 40. Insert additional member IDs: 70, 80 to show how the
deleted slots are reused.
Display the hash table after each operation.
3. A university uses an academic portal which has limited storage, so it adjusts its storage capacity
dynamically when the number of student IDs exceeds a certain threshold. However, these
unique IDs need to be strategically placed to minimize search times while ensuring that all slots
are accessible.
a. Create a hash table with an initial size of 7 and a load factor threshold of 0.75. Insert
student IDs: 12, 22, 32, 42, 52, 62.
b. Search for student IDs: 22, 42, 72.
c. Insert additional IDs: 72, 82 to exceed the load factor threshold. Use a new hash
function based on the resized table size.
Display the hash table after each operation.
4. A banking system is designed to store customer account numbers on a hash table. To ensure
data security and efficiency, the system uses an additional mathematical formula to decide the
next slot when collisions occur. This method ensures that even closely related account numbers
do not lead to clusters of occupied slots.
a. Create a hash table of size 11 to store customer account numbers. Use primary_hash =
ID % table_size for the initial position and secondary_hash = 7 - (ID % 7) for the step
size. Insert the following account numbers: 101, 111, 121, 131, 141, 151.
b. Search for account numbers: 111, 141, 161.
c. Delete account numbers: 111 and 131. Insert additional account numbers: 161 and 171
to demonstrate how the secondary formula resolves collisions while avoiding clustering.
Display the hash table after each operation.
9|Page