0% found this document useful (0 votes)
33 views16 pages

Lec5 - Hashing

Hash tables store elements in an array using a hash function to map elements to indices. Lookup, add, and remove operations have O(1) time complexity on average. Collisions occur when two elements hash to the same index, requiring collision resolution like chaining or probing. Probing strategies like linear probing can cause clustering that degrades performance, while quadratic and double hashing reduce clustering. The load factor affects average time complexity, with higher load indicating more collisions.

Uploaded by

Nour Hesham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views16 pages

Lec5 - Hashing

Hash tables store elements in an array using a hash function to map elements to indices. Lookup, add, and remove operations have O(1) time complexity on average. Collisions occur when two elements hash to the same index, requiring collision resolution like chaining or probing. Probing strategies like linear probing can cause clustering that degrades performance, while quadratic and double hashing reduce clustering. The load factor affects average time complexity, with higher load indicating more collisions.

Uploaded by

Nour Hesham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Hash tables

 hash table: an array of some fixed


size, that positions elements according
0
to an algorithm called a hash
function

hash func.
h(element) …

length –1
elements (e.g., strings) hash table
Dr Amr ElMasry & Dr Mervat Mikhail 1
Hashing and hash functions
 The idea: somehow we map every element into
some index in the array ("hash" it);
this is its one and only place that it should go
 Lookup becomes constant-time : simply look at that one
slot again later to see if the element is there
 add, remove, contains all become O(1) !

 For now, let's look at integers (int)


 a "hash function" h for int is trivial:
store int i at index i (a direct mapping)
 if i >= array.length, store i at index
(i % array.length)

 h(i) = i % array.length
Dr Amr ElMasry & Dr Mervat Mikhail
2
Hash function example
 elements = Integers 0
 h(i) = i % 10 1 41
 add 41, 34, 7, and 18 2
 constant-time lookup: 3
 just look at i % 10 again later 4 34
5
 Hash tables have no ordering information! 6
Expensive to do following:
7 7

 getMin, getMax, removeMin, removeMax,


 the various ordered traversals 8 18
 printing items in sorted order 9
Dr Amr ElMasry & Dr Mervat Mikhail
3
Hash collisions
 collision: the event that two hash table 0
elements map into the same slot in the 1 21
array 2
3
 example: add 41, 34, 7, 18, then 21 4 34
 21 hashes into the same slot as 41!
5
 21 should not replace 41 in the hash table;
they should both be there 6
7 7
8 18
collision resolution: a strategy for fixing
collisions in a hash table 9
Dr Amr ElMasry & Dr Mervat Mikhail
4
Chaining
 chaining: All keys that map to the same hash
value are kept in a linked list
0 10
1
2 22 12 42
3
4
5
6
7 107
8
9
Dr Amr ElMasry & Dr Mervat Mikhail 5
Open Addressing
 Open Addressing is
 a collision resolution strategy
 on a collision, look for another empty spot in the
array
 examples of open addressing
 linear probing
 quadratic probing
 double hashing
 Look-up for open addressing scheme must continue
looking for item until it finds it or an empty slot.
Dr Amr ElMasry & Dr Mervat Mikhail
6
Linear probing
 linear probing: resolving collisions in slot i 0
by putting the colliding element into the next 1 41
available slot (i+1, i+2, ...)
 add 41, 34, 7, 18, then 21, then 57 2 21
 21 collides (41 is already there), so we search ahead 3
until we find empty slot 2
 57 collides (7 is already there), so we search ahead twice 4 34
until we find empty slot 9
5
 lookup algorithm becomes slightly modified; we 6
have to loop now until we find the element or an 7 7
empty slot
 what happens when the table gets mostly full? 8 18
9 57
Dr Amr ElMasry & Dr Mervat Mikhail
7
Clustering problem
 clustering: nodes being placed close 0 49
together by probing, which degrades 1 58
hash table's performance 2 9
 add 89, 18, 49, 58, 9
3
4
 now searching for the value 28 will have to
check half the hash table! no longer 5
constant time... 6
7
8 18
9 89
Dr Amr ElMasry & Dr Mervat Mikhail
8
Quadratic probing
 quadratic probing: resolving collisions 0 49
on slot i by putting the colliding element 1
into slot i+1, i+4, i+9, i+16, ...
 add 89, 18, 49, 58, 9 2 58
 49 collides (89 is already there), so we search 3 9
ahead by +1 to empty slot 0
 58 collides (18 is already there), so we search 4
ahead by +1 to occupied slot 9, then +4 to 5
empty slot 2
 9 collides (89 is already there), so we search 6
ahead by +1 to occupied slot 0, then +4 to
empty slot 3 7
 clustering is reduced 8 18
 what is the lookup algorithm? 9 89
Dr Amr ElMasry & Dr Mervat Mikhail
9
0
Double Hashing 1
 You have a primary hash function i=hash1(x)
2 41
 Pick a secondary hash function hash2(x). 3
 when hashing item x, resolving collisions on slot
4
i =hash1(x) by putting the colliding element into slot
i+hash2(x), i+2*hash2(x), i+3*hash2(x), 5 18
i+4*hash2(x), ...
 Ex. Suppose hash1(x)=x%13 ,
6
hash2(x) = 7- (X % 7). 7
 add 18,41,22,44 : What happens?
 Put 18 in slot 5 8
 Put 41 in slot 2 9 22
 Put 22 in slot 9
 44 collides (18 is already there); hash2(x) = 5, so check10 44
location i + 5 next; put 44 in slot 10.
 what is the lookup algorithm? 11
12
Dr Amr ElMasry & Dr Mervat Mikhail
10
Writing a hash function
We want a hash function to:
1. be simple/fast to compute
2. map equal elements to the same index
3. map different elements to different indices as
much as possible
4. have keys distributed evenly among indices

Dr Amr ElMasry & Dr Mervat Mikhail


11
hash functions for strings
 view a string by its letters:
 String s : s0, s1, s2, …, sn-1
 a possible hash function:
 treat each character as an int, sum them, and hash on that
 n 1 
 h(s) =   si  % array.length
 i 0 
 what's wrong with this hash function? When will strings collide?

 another option:
 perform a weighted sum of the letters, and hash on that

 h(s) = % array.length
Dr Amr ElMasry & Dr Mervat Mikhail 12
Analysis of hash tables
 main operation: lookup of item in table
 What is worst-case cost of finding an item?
O(n)
 Worst-case analysis doesn’t make sense for
hash tables, look at average case cost
 Cost highly depend on the load factor
 Load factor is a measure for “till what load”
 Load factor  of a hash table is the ratio:
N  no. of elements inserted in HT
M  array size
Dr Amr ElMasry & Dr Mervat Mikhail 13
Rehashing and hash table size
 rehash: increasing the size of a hash table's array,
and re-storing all of the items into the array using the
hash function
 can we just copy the old contents to the larger array?

 When should we rehash? Some options:


 when load reaches a certain level (e.g.,  = 0.5)
 when an insertion fails

 What is the cost (Big-Oh) of rehashing?


 what is a good hash table array size?
 how much bigger should a hash table get when it grows?
Dr Amr ElMasry & Dr Mervat Mikhail
14
Ensuring efficient hash tables
 To get O(1) average case performance for
lookups and insertions, need
 good hash function
 distributes objects evenly among all buckets
 a load factor that is not too high
 choose table size well appropriate to number of elements
you expect to store
 keep rehashing to a minimum
 choose the largest initial capacity size you can reasonably
afford.

Dr Amr ElMasry & Dr Mervat Mikhail


15
Hash versus tree
 Which is better, a hash table or a search tree?
Hash Tree
- Better average for - Guarantee on worst-
lookup and insertion case search time
- Possible successor and
predecessor search
- Easy to access items in
sorted order

Dr Amr ElMasry & Dr Mervat Mikhail 16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy