0% found this document useful (0 votes)
14 views41 pages

Hashing

The document discusses different data structures and algorithms for searching data including linear search, binary search, hashing, and techniques for resolving collisions in hashing such as open addressing and chaining. It provides details on implementing various searching and hashing methods along with examples and time complexities.

Uploaded by

pramod rockz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views41 pages

Hashing

The document discusses different data structures and algorithms for searching data including linear search, binary search, hashing, and techniques for resolving collisions in hashing such as open addressing and chaining. It provides details on implementing various searching and hashing methods along with examples and time complexities.

Uploaded by

pramod rockz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT-II

SYLLABUS

D i c t i ona r i e s : Li ne a r Li s t R e p r e s e n t ati o n , S k i p L i s t R e p r e s e n t a ti on,


Operations – Insertion, Deletion And Searching.

H as h tab l e r e pr e s e n t a ti o n : H a sh F u nc ti o n s , C o l l i s i on R es o l u t i on -
s e p a r a t e C h a i ni n g , O pe n A ddr e s s i n g - li ne a r Pr o b i n g , Q u ad ra t i c
Probing, Double Hashing, Rehashing, Extendible Hashing.

ACE Engineering College(Autonomous)


Linear Search

 Linear search is implemented using following steps...  Example


 Step 1 - Read the search element from the user.  Search Element:18
 Step 2 - Compare the search element with the first
 Time Complexity: O(n)
element in the list.
 Step 3 - If both are matched, then display "Given element
is found!!!" and terminate the function
 Step 4 - If both are not matched, then compare search
element with the next element in the list.
 Step 5 - Repeat steps 3 and 4 until search element is
compared with last element in the list.
 Step 6 - If last element in the list also doesn't match, then
display "Element is not found!!!" and terminate the
function.

ACE Engineering College(Autonomous)


Binary Search
 Binary search is implemented using following steps...
 Step 1 - Read the search element from the user.
 Step 2 - Find the middle element in the sorted list.
 Step 3 - Compare the search element with the middle
element in the sorted list.
 Step 4 - If both are matched, then display "Given
element is found!!!" and terminate the function.
 Step 5 - If both are not matched, then check whether
the search element is smaller or larger than the
middle element.
 Step 6 - If the search element is smaller than middle element,
repeat steps 2, 3, 4 and 5 for the left sublist of the
middle element.
 Step 7 - If the search element is larger than middle element,
repeat steps 2, 3, 4 and 5 for the right sublist of the
middle element.
 Step 8 - Repeat the same process until we find the search
element in the list or until sublist contains only one element.
 Step 9 - If that element also doesn't match with the
Time Complexity: O(log n) search element, then display "Element is not found in
the list!!!" and terminate the function.

ACE Engineering College(Autonomous)


Drawbacks

 The main drawback of these techniques is-


 As the number of elements increases, time taken to perform the search also
increases.
 This becomes problematic when total number of elements become too large.

ACE Engineering College(Autonomous)


Hashing

 Hashing is another approach in which time required to search an element


doesn't depend on the total number of elements.
 Hashing is an effective way to reduce the number of comparisons to search an
element in a data structure.
 Using hashing data structure, a given element is searched with constant
time complexity.
 Hashing is the process of indexing and retrieving element (data) in
a data structure to provide a faster way of finding the element
using a hash key.

ACE Engineering College(Autonomous)


Advantage of Hashing

 Unlike other searching techniques,


 Hashing is extremely efficient.
 The time taken by it to perform the search does not depend upon
the total number of elements.
 It completes the search with constant time complexity O(1).

ACE Engineering College(Autonomous)


Hashing (Static Hashing)

 There are two concepts in hashing:


 Hash table
 Hash function
 Hash table is a data structure used for storing and retrieving data very
quickly.
 Insertion of data in the hash table is based on the key value.
 Hash function is a function which takes a piece of data (i.e. key) as input
and produces an integer (i.e. hash value) as output which maps the data to a
particular index in the hash table.

ACE Engineering College(Autonomous)


Hashing

ACE Engineering College(Autonomous)


Hashing Functions

 There are various hash functions


 Division method
 Mid square method
 Multiplicative hash function
 Digit folding

ACE Engineering College(Autonomous)


Division Method

 The hash function depends upon the


remainder of division.
 The divisor is table length 0
 Formula to calculate hash key is, 1
 H(key )= record % size 2 72
 For example: 3

 If the record 54, 72, 89, 37 is to be 4 54


placed in the hash table and if the 5
table size is 10. 6

 54%10=4 7 37
8
 72%10=2
9 89
 89%10=9
 37%10=7
ACE Engineering College(Autonomous)
Mid Square Method

 In this method, the key is squared and the middle or mid part of
the result is used as index.
 Consider, the key 3111 then
 31112 =9678321
 For the hash table of size 1000
 H(3111)=783(the middle 3 digits)

ACE Engineering College(Autonomous)


Multiplicative Hash Function

 The given record is multiplied by some constant value.


 Formula for computing the hash key is:
H(key)=floor(p*(fractional part of key*A))
where, p is integer constant and
A is constant real number
Donald knuth suggested to use constant A=0.61803398987
If the key 107 and p=50 then
H(key)= floor(50*(107*0.61803398987))
=floor(3306.4818458045)
=3306

ACE Engineering College(Autonomous)


Digit Folding

 The key is divided into separate parts and using some simple
operation these parts are combined to produce the hash key.
 For example, consider a record 12365412
 divided into separate parts as : 123 654 12
 add all these parts
 H(key)=123+654+12
 =0789
 The record will be placed at location 789 in the hash table.

ACE Engineering College(Autonomous)


COLLISION

 Collision occurs when the hash function maps two different keys
to same location. Obviously, two records can not be stored in the
same location.

 Similarly when there is no room for a key in the hash table then
such a situation is called Overflow.

ACE Engineering College(Autonomous)


Collision

Example: 0
 Consider a hash function 1 131
 H(key)=recordkey%10 hash table size=10
2
 The record keys are: 131, 44, 43, 78, 19, 36, 57 and 77
 131%10=1 3 43
 44%10=4 4 44
 43%10=3
5
 78%10=8
 19%10=9 6 36
 36%10=6 7 57
 57%10=7
8 78
 77%10=7  Collision
9 19
 From the index 7 if we look for next vacant passion at subsequent indices 8,9 also ther
is no place in hash table. This situation is called Overflow.

ACE Engineering College(Autonomous)


Collision Resolution/Overflow Handling

 Therefore, a method used to solve the problem of collision also


called collision resolution technique is applied.
 The two most popular methods of resolving collision are:
 Collision resolution by open addressing
 Collision resolution by chaining

ACE Engineering College(Autonomous)


Open Addressing

 Once a collision takes place, open addressing computes new positions


using a probe sequence and the next record is stored in that position.
 In this technique of collision resolution, all the values are stored in the
hash table.
 The hash table will contain two types of values- either
 sentinel value (for example, -1) or
 a data value.
 The presence of sentinel value indicates that the location contains no data
value at present but can be used to hold a value.
 The process of examining memory locations in the hash table is called
probing.
ACE Engineering College(Autonomous)
Open Addressing Techniques

 Open addressing technique can be implemented using-

 Linearprobing,
 Quadratic probing

 Double hashing.

 Rehashing

ACE Engineering College(Autonomous)


Linear Probing

 When two records demand for the same home bucket in the hash table then
collision can be solved by placing the second record linearly down whenever
the empty bucket is found.
 In linear probing, the hash table is represented as a one dimensional array
with indices range from 0 to hash table size-1.
 Before inserting, initialize all slots in the table to be empty.
 It allows us to detect collisions and overflows when we insert into hash table.

ACE Engineering College(Autonomous)


Linear Probing

 For example, Keys: 131, 4,8,7,21, 5, 31, 61, 9, 29 0 -1


 Hash table size: 10 1 -1
 Step 1: initialize all locations with -1. 0 2 -1 0 -1 -1
 Step 2: first put 131, 4, 8, 7 1 -1 3 1310 1 -1 -1 131
0 0 -1
 131%10=1 4%10=4 8%10=8 7 %10=7 2 131 4 0
-11 0 2 -1 131 -129-1
1 1 131
 Next key is 21 3 21 5 -12 1 3 -1 21 131
2 2 1 21 131-1
 H(key)=21%10=1 Collision(131) 4 -1 6 43 2 4 -1 -1 21214
3 2
3 next empty 31
 To resolve this collision we will linearly move down and at the3
5 4 7 -14 3 5 -1 4 3131-1
location we will probe the element 4 4 4
6 5 8 -15 4 6 -1 -1 4 4-1
 Next 5 will be placed at index 5 5 5 4 5
7 -1 9 76 5 7 -1 -1 5 5 7
 31,61 will be follows linear probing 6 6 5 61
8 7 6
87 6 8 7 61618
 Next record 9 will be placed at index 9 7 7 7
 Next final record is 29 collision and overflow to9 handle it 7
-18we7move
9 8back7 -1 to 0.
8 8 8 8 7
that index 0 is empty 29 will be placed. 9 -1 9 88 -1 8 8
9 -1
9 9
9 9
ACE Engineering College(Autonomous)
GATE Question

 Consider a hash table of size seven, with starting index zero, and a hash
function (3x + 4)mod7. Assuming the hash table is initially empty, which of
the following is the contents of the table when the sequence 1, 3, 8, 10 is
inserted into the table using closed hashing? Note that ‘_’ denotes an empty
location in the table GATE 2007
 (A) 8, _, _, _, _, _, 10
(B) 1, 8, 10, _, _, _, 3
(C) 1, _, _, _, _, _,3
(D) 1, 10, 8, _, _, _, 3
 Answer: B

ACE Engineering College(Autonomous)


 1. what is hashing? 2marks (R15) Nov/Dec 2016
 Explain various hashing methods with suitable examples.
 What is collision? Explain different collision resolution techniques with
examples. 10 marks (R15) Nov/Dec 2016
 Discuss about linear probing. R18 Dec 2019

ACE Engineering College(Autonomous)


HASHING

OPEN ADDRESSING

QUADRATIC PROBING

ACE Engineering College(Autonomous)


Limitations of Linear Probing

 One major problem with linear


probing is primary clustering.
0 39
 Primary clustering is a process in 1 29 Cluster is formed
which a block of data is formed in the 2 8
hash table when collision is resolved. 3
 For example: 19, 18, 39, 29, 8 4
Rest of the table is empty
 19%10=9 5
6
 18%10=8
7
 39%10=9 8 18
 29%10=9 9 19
 8%10=8
Clustering problem can be solved by quadratic probing
ACE Engineering College(Autonomous)
Quadratic Probing

 Quadratic Probing is similar to Linear probing.


 It operates by taking the original hash value and adding successive values of
an arbitrary quadratic polynomial to the starting vale.
 This method uses the following formula:
 Hi(key)=(Hash(key)+i2)%m //where m can be table size or any prime number

ACE Engineering College(Autonomous)


Example on Quadratic Probing

 For example, Hash Table size=10 Hi(key)=(Hash(key)+i2)%m


elements to be inserted are: 37, 90, 55, 22, 11, 17, 49, 87
 We will fill the hash table step by step 0 90
0
0 90
90
 37%10=7 90%10=0 55%10=5 22%10=2 11%10=1 1 0 11 90
1 11
 17%10=7 Collision 1 11
2 22
 We will apply quadratic probing to insert 17 into the hash table 2 22
2 22
 We will choose value i =1,2,3…, whichever is applicable 3
3
 Consider i=1 then 3
4
 (7+12)%10=8 , when i=1 4
4
 The bucket 8 is empty hence we will place the element at index 8 5 55
5 55
 Next, 49%10=9 5 55
6
 Now to place 87 87%10=7 collision we will use quadratic probing 6
6 87
 (7+1)%10=8  Collision 7 37
7 37
7 37
 (7+4)%10=1  collision 8
8 17
 (7+9)%10=6 this bucket is free we can place 8 17
9
9 49
9 49
ACE Engineering College(Autonomous)
Limitations of Quadratic Probing

 Secondary Clustering: When using quadratic probing, there is no


guarantee of finding an empty cell once the table becomes more than
half full, or
 Even before this if the table size is composite, because collisions must be resolved
using half of the table at most.
 For example,
if our hash table has three slots, then records that hash to slot 0 can probe only
to slots 0 and 1 (that is, the probe sequence will never visit slot 2 in the table). Thus,
if slots 0 and 1 are full, then the record cannot be inserted even though the table is
not full!
 A more realistic example is a table with 105 slots. The probe sequence starting from
any given slot will only visit 23 other slots in the table. If all 24 of these slots should
happen to be full, even if other slots in the table are empty, then the record cannot
be inserted because the probe sequence will continually hit only those same 24 slots.

ACE Engineering College(Autonomous)


Quadratic Probing

 Secondary Clustering example: 19,18,39,29,8

ACE Engineering College(Autonomous)


 Hash the following in a table of size 11. use any two collision resolution
techniques. 23,0,52,61,78,33,100,8,90,10,14

ACE Engineering College(Autonomous)


HASHING

OPEN ADDRESSING

DOUBLE HASHING

ACE Engineering College(Autonomous)


Double Hashing

 Double Hashing is a technique in which a second hash function is applied to the key when collision
occurs.
 Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
 By applying the second hash function we will get the number of positions from the point of collision
to insert.
 There are two important rules to be followed for the second function:
 It must never evaluate to zero.
 Must make sure that all cells can be probed.
 The formula to be used for double hashing is:
 H1(key)=key %Table size
 H2(key)=M-(key % M)

 Where M is a prime number smaller than the size of the table.

ACE Engineering College(Autonomous)


 H1(key)=key %Table
Example on Double Hashing: size
 H2(key)=M-(key % M)

 Consider the following elements to be placed in the hash table of size 10


37, 90, 45, 22, 17, 49, 55
 Initially insert the elements using the formula for H1(key).
 Insert 37,90,45,22
 H1(37)=37%10=7
0 90
 H1(90)=90%10=0 00 9090
 H1(45)=45%10=5 1 0 90
 H1(22)=22%10=2
1 1 1717
2 1 22 17
 Now if 17 is to be inserted then 22 2222
 H1(17)=17%10=7Collision so we have to apply second hash function 3 2 22
33
 H2(key)=M-(key%M) here M is a prime number smaller than the size of the table size 10 is 7 4 3
 Hence M=7 44
 H2(17)=7-(17%7)=7-3=4 //so we have to add this number after four places of index 7 we 5 have4 to45take 4
55 4545
jumps. Therefore the 17 will be placed at index 1. 6 5 45
 Next 49 will be 49%10=9 66
 Now to insert number 55 7 6 37 55
77 3737
 H1(55)=55%10=5  Collision 8 7 37
 H2(55)=7-(55%7)=7-6=1 //one jump from index 5. 8 8
9 8
99 49
9 49
ACE Engineering College(Autonomous)
Practice Question on Double Hashing

 Keys: 46,28,21,35,57,39,19,50

 h1(x)=x mod 11
 H2(x)=M-(x mod M) // where M=7

ACE Engineering College(Autonomous)


Rehashing

 When the hash table becomes too full in open addressing


hashing, the successive insertion operation will take more time to
complete. To overcome this situation rehashing technique can be
used.
 In rehashing technique another hash table is build that is about
twice as big and scan down the entire original hash table,
computing the new hash value for each element and inserting it
in the new table.
 For example: insert 13,15,24,6,23 to a hash table of size 7
 Hash function: h(x)=x mod 7
ACE Engineering College(Autonomous)
Rehashing(cont..)

 Since the table is full, a new table is created. The size of this table will be 17,
because this is the first prime which is twice as large as the old table size.
 h(x)= x mod 17
 Now the old table is scanned and the elements are inserted into the new table.
The resulting table is: h(x)= x mod 17

 Rehashing can be implemented in several ways:


 Rehash as soon as the table is half full.

 Rehash only when an insertion fails.

 Rehash when the table reaches a certain load factor.

ACE Engineering College(Autonomous)


Extendible hashing

 The major problems using open hashing or closed hashing is that collisions
could cause several blocks to be examined during a find, even for a well
distributed hash table gets too full, and when the table gets too full, an
extremely expensive rehashing step must be performed.
 To avoid these problems extendible hashing is used.
 In extendible hashing, a hash table is stored in the main memory
and buckets are stored in the disk.
 Each value in the hash table is a pointer to a bucket in the
secondary memory.
 The hash function is applied on the input value to generate the bucket pointer
and perform the store and retrieve operation.
ACE Engineering College(Autonomous)
Extendible hashing(cont..)

 Example: consider our data consists of 6 bit integer values. The root of the
tree contains four pointers determined by the leading two bits of the data.
Each leaf has up to m=4 elements.
 In each leaf the first two bits are identical, this is indicated by the number in
parenthesis. D will represent the number of bits used by the root, which is
sometimes known as the directory.
 The number of entries in the directory is thus 2D
 DL is the number of leading bits that all the elements of some leaf L have in
common. DL<=D

ACE Engineering College(Autonomous)


00 01 10 11

000100 010100 100000 111000


001000 011000 101000 111001
001010 101100
001011 101110
ACE Engineering College(Autonomous)
Gate Questions on Hashing

 Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199)
and the hash function x mod 10, which of the following statements are true?
i) 9679, 1989, 4199 hash to the same value
ii) 1471, 6171 has to the same value
iii) All elements hash to the same value
iv) Each element hashes to a different value
(a) i only
(b) ii only
(c) i and ii only
(d) iii or iv Gate 2004
 Answer: C

ACE Engineering College(Autonomous)


Gate Questions on Hashing

 A hash table contains 10 buckets and uses linear probing to resolve collisions.
The key values are integers and the hash function used is key % 10. If the
values 43, 165, 62, 123, 142 are inserted in the table, in what location would
the key value 142 be inserted? GATE 2005 Question
 (A) 2
(B) 3
(C) 4
(D) 6
 Answer: D

ACE Engineering College(Autonomous)


Gate Questions on Hashing

 The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash
table of length 10 using open addressing with hash function h(k) = k mod 10
and linear probing. What is the resultant hash table?

 Answer: C

ACE Engineering College(Autonomous)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy