Rabin Karp
Rabin Karp
1
Brute Force
• The Brute Force algorithm compares the pattern to the text, one
character at a time, until unmatching characters are found
3
Brute Force-Complexity
• Given a pattern M characters in length, and a text N characters in
length...
• Worst case: compares pattern to each substring of text of length M.
For example, M=5.
• This kind of case can occur for image data.
7
Rabin-Karp Example
• Hash value of “AAAAA” is 37
• Hash value of “AAAAH” is 100
8
Rabin-Karp Algorithm
pattern is M characters long
hash_p=hash value of pattern
hash_t=hash value of first M letters in body of text
do
if (hash_p == hash_t)
brute force comparison of pattern and selected section of
text
hash_t= hash value of next section of text, one character over
while (end of text)
9
Hash Function
• Let b be the number of letters in the alphabet. The text subsequence t[i .. i+M-1] is
mapped to the number
12
Rabin-Karp Mods
• If M is large, then the resulting value (~bM) will be enormous. For this reason,
we hash the value by taking it mod a prime number q.
• The mod function is particularly useful in this case due to several of its
inherent properties:
[(x mod q) + (y mod q)] mod q = (x+y) mod q
(x mod q) mod q = x mod q
• For these reasons:
h(i)=((t[i] bM-1 mod q) +(t[i+1] bM-2 mod q) + … +(t[i+M-1] mod q))mod q
h(i+1) =( h(i) b mod q
Shift left one digit
-t[i] bM mod q
Subtract leftmost digit
+t[i+M] mod q )
Add new rightmost digit
mod q
13
Rabin-Karp Complexity
• If a sufficiently large prime number is used for the hash function,
the hashed values of two different patterns will usually be distinct.
• If this is the case, searching takes O(N) time, where N is the
number of characters in the larger body of text.
• It is always possible to construct a scenario with a worst case
complexity of O(MN). This, however, is likely to happen only if
the prime number used for hashing is small.
14