Hashing
Hashing
SYLLABUS
H as h tab l e r e pr e s e n t a ti o n : H a sh F u nc ti o n s , C o l l i s i on R es o l u t i on -
s e p a r a t e C h a i ni n g , O pe n A ddr e s s i n g - li ne a r Pr o b i n g , Q u ad ra t i c
Probing, Double Hashing, Rehashing, Extendible Hashing.
54%10=4 7 37
8
72%10=2
9 89
89%10=9
37%10=7
ACE Engineering College(Autonomous)
Mid Square Method
In this method, the key is squared and the middle or mid part of
the result is used as index.
Consider, the key 3111 then
31112 =9678321
For the hash table of size 1000
H(3111)=783(the middle 3 digits)
The key is divided into separate parts and using some simple
operation these parts are combined to produce the hash key.
For example, consider a record 12365412
divided into separate parts as : 123 654 12
add all these parts
H(key)=123+654+12
=0789
The record will be placed at location 789 in the hash table.
Collision occurs when the hash function maps two different keys
to same location. Obviously, two records can not be stored in the
same location.
Similarly when there is no room for a key in the hash table then
such a situation is called Overflow.
Example: 0
Consider a hash function 1 131
H(key)=recordkey%10 hash table size=10
2
The record keys are: 131, 44, 43, 78, 19, 36, 57 and 77
131%10=1 3 43
44%10=4 4 44
43%10=3
5
78%10=8
19%10=9 6 36
36%10=6 7 57
57%10=7
8 78
77%10=7 Collision
9 19
From the index 7 if we look for next vacant passion at subsequent indices 8,9 also ther
is no place in hash table. This situation is called Overflow.
Linearprobing,
Quadratic probing
Double hashing.
Rehashing
When two records demand for the same home bucket in the hash table then
collision can be solved by placing the second record linearly down whenever
the empty bucket is found.
In linear probing, the hash table is represented as a one dimensional array
with indices range from 0 to hash table size-1.
Before inserting, initialize all slots in the table to be empty.
It allows us to detect collisions and overflows when we insert into hash table.
Consider a hash table of size seven, with starting index zero, and a hash
function (3x + 4)mod7. Assuming the hash table is initially empty, which of
the following is the contents of the table when the sequence 1, 3, 8, 10 is
inserted into the table using closed hashing? Note that ‘_’ denotes an empty
location in the table GATE 2007
(A) 8, _, _, _, _, _, 10
(B) 1, 8, 10, _, _, _, 3
(C) 1, _, _, _, _, _,3
(D) 1, 10, 8, _, _, _, 3
Answer: B
OPEN ADDRESSING
QUADRATIC PROBING
OPEN ADDRESSING
DOUBLE HASHING
Double Hashing is a technique in which a second hash function is applied to the key when collision
occurs.
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
By applying the second hash function we will get the number of positions from the point of collision
to insert.
There are two important rules to be followed for the second function:
It must never evaluate to zero.
Must make sure that all cells can be probed.
The formula to be used for double hashing is:
H1(key)=key %Table size
H2(key)=M-(key % M)
Keys: 46,28,21,35,57,39,19,50
h1(x)=x mod 11
H2(x)=M-(x mod M) // where M=7
Since the table is full, a new table is created. The size of this table will be 17,
because this is the first prime which is twice as large as the old table size.
h(x)= x mod 17
Now the old table is scanned and the elements are inserted into the new table.
The resulting table is: h(x)= x mod 17
The major problems using open hashing or closed hashing is that collisions
could cause several blocks to be examined during a find, even for a well
distributed hash table gets too full, and when the table gets too full, an
extremely expensive rehashing step must be performed.
To avoid these problems extendible hashing is used.
In extendible hashing, a hash table is stored in the main memory
and buckets are stored in the disk.
Each value in the hash table is a pointer to a bucket in the
secondary memory.
The hash function is applied on the input value to generate the bucket pointer
and perform the store and retrieve operation.
ACE Engineering College(Autonomous)
Extendible hashing(cont..)
Example: consider our data consists of 6 bit integer values. The root of the
tree contains four pointers determined by the leading two bits of the data.
Each leaf has up to m=4 elements.
In each leaf the first two bits are identical, this is indicated by the number in
parenthesis. D will represent the number of bits used by the root, which is
sometimes known as the directory.
The number of entries in the directory is thus 2D
DL is the number of leading bits that all the elements of some leaf L have in
common. DL<=D
Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199)
and the hash function x mod 10, which of the following statements are true?
i) 9679, 1989, 4199 hash to the same value
ii) 1471, 6171 has to the same value
iii) All elements hash to the same value
iv) Each element hashes to a different value
(a) i only
(b) ii only
(c) i and ii only
(d) iii or iv Gate 2004
Answer: C
A hash table contains 10 buckets and uses linear probing to resolve collisions.
The key values are integers and the hash function used is key % 10. If the
values 43, 165, 62, 123, 142 are inserted in the table, in what location would
the key value 142 be inserted? GATE 2005 Question
(A) 2
(B) 3
(C) 4
(D) 6
Answer: D
The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash
table of length 10 using open addressing with hash function h(k) = k mod 10
and linear probing. What is the resultant hash table?
Answer: C