UNIT 1- Hashing
UNIT 1- Hashing
CLASS : SE COMPUTER - A
SUBJECT : DSA (SEM-II)
:I
.UNIT
SYLLABUS
SYLLABUS
Hash Table: Concepts-hash table, hash function, basic operations,
bucket, collision, probe, synonym, overflow, open hashing, closed
hashing, perfect hash function, load density, full table, load factor,
rehashing, issues in hashing
Hash Functions: properties of good hash function, division,
multiplication, extraction, mid-square, folding and universal
Collision Resolution Strategies: open addressing and chaining,
Hash table overflow- open addressing and chaining, extendible
hashing, closed addressing and separate chaining.
Skip List: representation, searching and operations- insertion,
removal
UNIT-I
HASHIN
HASHIN
• Hashing is one of the searching techniques that uses a constant time.
G
The time complexity in hashing is O(1). Till now, we read the two
techniques for searching, i.e., linear search and binary search
• The worst time complexity in linear search is O(n), and O(logn) in
binary search. In both the searching techniques, the searching depends
upon the number of elements but we want the technique that takes a
constant time. So, hashing technique came that provides a constant
time.
• In Hashing technique, the hash table and hash function are used.
Using the hash function, we can calculate the address at which the
value can be stored.
INTRODUCTIO
N
1. Hashing is finding an address where the data is
to be stored as well as located using a key with
the help of the algorithmic function.
2. Hashing is a method of directly computing the
address of the record with the help of a key by
using a suitable mathematical function called
the hash function
3. A hash table is an array-based structure used to
store <key, information> pairs
HASHING/HASH
FUNCTION
• The main idea behind the hashing is to create the (key/value)
pairs. If the key is given, then the algorithm computes the index
at which the value would be stored. It can be written as:
• Index = hash(key)
INTRODUCTIO
3. Hash Table:N
• A hash table is an array-based structure used to store
<key, value> pairs.
• A Hash table is a data structure that stores some
information, and the information has basically two main
components, i.e., key and value.
• A Hash table can be used for quick insertion, searching
and retrieval of data.
• A hash function is applied to the key of the record
being stored, returning an index within the range of the
hash table.
• The resulting address is used as the basis for storing and
retrieving records and this address is called as home
address of the record
HASH
• The item is thenTABLE
stored in the table of that index
position
• Hash table is one of the most important data structures that
uses a special function known as a hash function that
maps a given value with a key to access the elements
faster.
• The hash table can be implemented with the help of an
associative array.
• The efficiency of mapping depends upon the efficiency of
the hash function used for mapping.
HASH
TABLE
1. Databases(Indexing Technique)
2. Associative arrays
3. Sets
4. Memory cache
HASH
FUNCTION
• A function that maps a key into the range [0 to Max − 1], the
result of which is used as an index (or address) to hash table for
storing and retrieving record
• The address generated by hashing function is called as home
address
• All home addresses address to particular area of memory and that
area is called as prime area
PROPERTIES OF HASH
FUNCTION
1) Hash function should be simple to computer.
2) Number of collision should be less
3) The hash function uses all the input data.
4) The hash function "uniformly" distributes the data across
the entire set of possible hash values.
5) The hash function generates very different hash values
for similar strings.
BUCKE
•
T
Bucket is an index position in hash table that can store more than
one record
• A hash file stores data in bucket format.
• Bucket is considered a unit of storage.
• A bucket typically stores one complete disk block, which in turn
can store one or more records.
• When the same index is mapped with two keys, then both
the records are stored in the same bucket
BUCKET
The load factor is simply a measure of how full (occupied) the hash table
is, and is simply defined as: α = number of occupied slots/total slots
In simple words, consider we have a hash table of size 1000, and we have
500 slots filled, then the load factor would become α = 500/1000 = 0.5
If Load factor (α) = constant, then time complexity of
Insert, Search, Delete = Θ(1)
Load Density
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95 = 90
k = 1276
M = 11
h(1276) = 1276 mod 11 =0
TYPES OF HASH
FUNCTION
1. Division Method:
TYPES OF HASH
2. FUNCTION
Mid Square Method
It involves two steps to compute the hash value-
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
TYPES OF HASH
3.FUNCTION
Digit Folding Method :
TYPES OF HASH
FUNCTION
4. Multiplication method :
This method involves the following steps:
• Choose a constant value A such that 0 < A < 1.
• Multiply the key value with A.
• Extract the fractional part of kA.
• Multiply the result of the above step by the size of the hash table
i.e. M.
• The resulting hash value is obtained by taking the floor of
the result obtained in step 4.
TYPES OF HASH
FUNCTION
4. Multiplication method :
Formula: h(K) = floor (M* (kA mod 1))
M is the size of the hash
table. k is the key value.
A is a constant value.
⌊k A⌋.
Where "k A mod 1" means the fractional part of k A, that is, k A -
TYPES OF HASH
FUNCTION
4. Multiplication method :
Example:
k = 12345
A = 0.357840
M = 100
h(26) = 26%10 = 6
Therefore, two values are stored at the same index, i.e., 6, and this leads to the
collision problem. To resolve these collisions, we have some techniques known
as collision techniques.
Linear probing
the position where the collision
occurred and moving forward. If the
end of the list is reached and no
empty slot is found. The probing
This method uses quadratic polynomial
Quadratic starts at the beginning of the list.
probing expressions to find the next available
free slot.
Double This technique uses a secondary
Hashing hash function algorithm to find the
next free available slot.
Linear Probing
Insertion
The insertion algorithm is as follows:
1. Use hash function to find index for a record
2. If that spot is already in use, we use next available spot in a
"higher" index.(Find first empty slot of hash table in linear
fashion)
3. Treat the hash table as if it is round, if you hit the end of the
hash table, go back to the front
Linear Probing
Searching
The searching algorithm is as follows:
1. Use hash function to find index of where an item should be
inserted.
2. If it isn't there search records that records after that hash
location (remember to treat table as cicular) until either it found, or
until an empty record is found. If there is an empty spot in the
table before record is found, it means that the the record is not
there
Linear Probing
Delete/ Remove
The Delete/Remove algorithm is as follows:
1. Find record and remove it making the spot empty
2. If it isn't there search records that records after that hash
location (remember to treat table as circular) until either it found, or
until an empty record is found. If there is an empty spot in the
table before record is found, it means that the the record is not there
Linear Probing
Let's understand the linear probing through an example.
Consider the above example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
index = h(k)
%10
• The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5
respectively.
• The calculated index value of 11 is 5 which is already occupied by
another key value, i.e., 6.
• When linear probing is applied, the nearest empty cell to the index 5 is
6; therefore, the value 11 will be added at the index 6.
Linear Probing
Let's understand the linear probing through an example.
Consider the above example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
index = h(k)
%10
• The next key value is 13. The index value associated with this key
value is 9 when hash function is applied.
• The cell is already filled at index 9.
• When linear probing is applied, the nearest empty cell to the index 9 is
0; therefore, the value 13 will be added at the index 0
Linear Probing
Let's understand the linear probing through an example.
Consider the above example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, index key
1. Simple
We linearly iterate to find the next slot.
2. Fast
With the use of localized access (Locality of reference)
It gives constant time performance in ideal situation.
Linear Probing
Disadvantages
There are a few drawbacks when using linear probing to maintain
a hash table. Let’s take a look together!
Clustering
Linear probing is sensitive to a phenomenon called clustering.
Clustering is a phenomenon that occurs as elements are added to a
hash table. Elements may have a tendency to clump together,
forming clusters, which over time will significantly impact
performance for searching and adding elements because we’ll
approach a worst case O(n) time complexity.
Linear Probing with
Chaining
1. Chaining without replacement
We can put some other quadratic equations also using some constants
The value of i = 0, 1, . . ., m-1. So we start from i = 0, and increase this
until we get one free space. So initially when i = 0, then the h(x, i) is
same as h´(x).
CLOSED
HASHING
DOUBLE HASHING
Double hashing : It is a collision resolving technique in Open
Addressed Hash tables. Double hashing uses the idea of applying a
second hash function to key when a collision occurs.
• Hash function is used to compute the hash value for a key to be inserted.
• Hash value is then used as an index to store the key in the hash table.
In case of collision,
• Probing is performed until an empty bucket is found.
• Once an empty bucket is found, the key is inserted.
• Probing is performed in accordance with the technique used for
open addressing.
Open
Operations In open addressing,
Addressing
Search Operation-
In such situations, we have to transfer entries from old table to the new
table by re computing their positions using hash functions
Rehashi
ng
Rehashi
ngrehashing means hashing again.
As the name suggests,
Basically, when the load factor increases to more than its
pre- defined value (default value of load factor is
0.75), the complexity increases.
So to overcome this, the size of the array is
increased (doubled) and all the values are hashed again
and stored in the new double sized array to maintain a
low load factor and low complexity.
Rehashi
But that comes with ang
price:
With the new size the Hash function can change, which
means all the 75 elements we had stored earlier, would now
with this new hash Function might yield different Index to
place them, so basically we rehash all those stored elements
with the new Hash Function and place them at new Indexes
of newly resized bigger HashTable.
Rehashi
ng
Why rehashing?
Rehashing is done because whenever key value pairs are
inserted into the map, the load factor increases, which implies
that the time complexity also increases as explained above.
This might not give the required time complexity of O(1).
Hence, rehash must be done, increasing the size of the
bucketArray so as to reduce the load factor and the time
complexity
Rehashi
How Rehashing isng
done?
Rehashing can be done as follows:
• For each addition of a new entry to the map, check the load
factor.
• If it’s greater than its pre-defined value (or default value of 0.75
if not given), then Rehash.
• For Rehash, make a new array of double the previous size and
make it the new bucketarray.
• Then traverse to each element in the old bucketArray and call
the insert() for each so as to insert it into the new larger bucket
array.
Rehashing & Double Hashing
How Rehashing is different than Double Hashing?
• In double hashing, two different hash functions are applied at
the same time and in rehashing same hash function is applied
again and again to generate a unique mapping value on
increased hash table size.
Dynamic Hashing
1. Static hashing does not expand or shrink the hash table
dynamically as the size of database grows or shrinks and
bucket overflow occurs.
2. The dynamic hashing method is used to overcome the
problems of static hashing like bucket overflow.
3. In this method, data buckets grow or shrink as the records
increases or decreases.
4. This method is also known as Extendable/Extensible
hashing method.
Extensible/Extendible
• Hashing
How to search a key
1. First, calculate the hash address of the key.
2. Check how many bits are used in the directory, and these
bits are called as i.
3. Take the least significant i bits of the hash address.
This gives an index of the directory.
4. Now using the index, go to the directory and find bucket
address where the record might be.
Extensible/Extendible
• Hashing
How to insert a new record
1. Firstly, you have to follow the same procedure
for retrieval, ending up in some bucket.
2. If there is still space in that bucket, then place the record
in it.
3. If the bucket is full, then we will split the bucket
and redistribute the records.
Extensible/Extendible
• Hashing
For example :
Consider the following grouping of keys into buckets
depending on the prefix of their hash address:
Extensible/Extendible
Hashing
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The
last two bits of 5 and 6 are 01, so it will go into bucket B1. The last
two bits of 1 and 3 are 10, so it will go into bucket B2. The last two
bits of 7 are 11, so it will go into B3.
Extensible/Extendible
Hashing
Insert key 9 with hash address 10001 into the above structure:
1. Since key 9 has hash address 10001, it must go into the first bucket.
But bucket B1 is full, so it will get split.
2. The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are
001, so it will go into bucket B1, and the last three bits of 6 are 101, so
it will go into bucket B5.
3. Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and
100 entry because last two bits of both the entry are 00.
4. Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and
110 entry because last two bits of both the entry are 10.
5. Key 7 are still in B3. The record in B3 pointed by the 111 and 011
entry because last two bits of both the entry are 11.
Extensible/Extendible
Hashing
Example
Solve Below hashing problem using extendible hashing.
Advantages of Extensible
1.Hashing
In this method, the performance does not decrease as the
data
with the data. There will not be any unused memory lying.
3. This method is good for the dynamic database where data grows
also increased.
2. In this case, the bucket overflow situation will also occur. But it
might take little time to reach this situation than static hashing.
Linked List Benefits &
Drawbacks
• Benefits:
- Easy to insert & delete in O(1) time
- Don’t need to estimate total memory needed
• Drawbacks:
- Hard to search in less than O(n) time (binary search
doesn’t work, eg.)
- Hard to jump to the middle
• Skip Lists:
- fix these drawbacks
- good data structure for a dictionary ADT
Skip
1.
List
A skip list is a probabilistic data structure.
2. Invented around 1990 by Bill Pugh
3. Expected search time is O(log n)
4. The skip list is used to store a sorted list of elements or
data with a linked list.
5. It allows the process of the elements or data to view
efficiently.
6. In one single step, it skips several elements of the entire
list, which is why it is known as a skip list.
7. Randomized/Probabilistic data structure:
- use random coin flips to build the data structure
Skip
4.
List
The skip list is an extended version of the linked list.
5. It allows the user to search, remove, and insert the element
very quickly.
6. It consists of a base list that includes a set of elements
which maintains the link hierarchy of the subsequent
elements.
Skip
Skip list structure
List
linked list