0% found this document useful (0 votes)
81 views13 pages

Rabin Karp

The document discusses string pattern matching algorithms. It describes the brute force algorithm and its pseudo-code. It has a worst case time complexity of O(MN) where M is the pattern length and N is the text length. The document also describes the Rabin-Karp algorithm which uses hashing to improve the worst case time complexity to O(N). It explains how the Rabin-Karp algorithm calculates a hash value for the pattern and text subsequences to avoid unnecessary character comparisons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views13 pages

Rabin Karp

The document discusses string pattern matching algorithms. It describes the brute force algorithm and its pseudo-code. It has a worst case time complexity of O(MN) where M is the pattern length and N is the text length. The document also describes the Rabin-Karp algorithm which uses hashing to improve the worst case time complexity to O(N). It explains how the Rabin-Karp algorithm calculates a hash value for the pattern and text subsequences to avoid unnecessary character comparisons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Strings and Pattern Matching

1
Brute Force
• The Brute Force algorithm compares the pattern to the text, one
character at a time, until unmatching characters are found

Compared characters are italicized.


Correct matches are in boldface type.
• The algorithm can be designed to stop on either the first
occurrence of the pattern, or upon reaching the end of the text. 2
Brute Force Pseudo-Code
• Here’s the pseudo-code
do if (text letter == pattern letter)
compare next letter of pattern to next
letter of text
else move pattern down text by one letter
while (entire pattern found or end of text)

3
Brute Force-Complexity
• Given a pattern M characters in length, and a text N characters in
length...
• Worst case: compares pattern to each substring of text of length M.
For example, M=5.
• This kind of case can occur for image data.

Total number of comparisons: M (N-M+1) 4


Worst case time complexity: O(MN)
Brute Force-Complexity(cont.)
• Given a pattern M characters in length, and a text N characters in
length...
• Best case if pattern found: Finds pattern in first M positions of text.
For example, M=5.

Total number of comparisons: M


Best case time complexity: O(M) 5
Brute Force-Complexity(cont.)
• Given a pattern M characters in length, and a text N characters in length...
• Best case if pattern not found: Always mismatch on first character. For
example, M=5.

Total number of comparisons: N 6


Best case time complexity: O(N)
Rabin-Karp

• The Rabin-Karp string searching algorithm calculates a hash value


for the pattern, and for each M-character subsequence of text to be
compared.
• If the hash values are unequal, the algorithm will calculate the hash
value for next M-character sequence.
• If the hash values are equal, the algorithm will do a Brute Force
comparison between the pattern and the M-character sequence.
• In this way, there is only one comparison per text subsequence, and
Brute Force is only needed when hash values match.

7
Rabin-Karp Example
• Hash value of “AAAAA” is 37
• Hash value of “AAAAH” is 100

8
Rabin-Karp Algorithm
pattern is M characters long
hash_p=hash value of pattern
hash_t=hash value of first M letters in body of text
do
if (hash_p == hash_t)
brute force comparison of pattern and selected section of
text
hash_t= hash value of next section of text, one character over
while (end of text)

9
Hash Function
• Let b be the number of letters in the alphabet. The text subsequence t[i .. i+M-1] is
mapped to the number

• Furthermore, given x(i) we can compute x(i+1) for the next


subsequence t[i+1 .. i+M] in constant time, as follows:

• In this way, we never explicitly compute a new value. We


simply adjust the existing value as we move over one 11
character.
Rabin-Karp Math Example

• Let’s say that our alphabet consists of 10 letters.


• our alphabet = a, b, c, d, e, f, g, h, i, j
• Let’s say that “a” corresponds to 1, “b” corresponds to 2 and so
on.
The hash value for string “cah” would be ...

3*100 + 1*10 + 8*1 = 318

12
Rabin-Karp Mods
• If M is large, then the resulting value (~bM) will be enormous. For this reason,
we hash the value by taking it mod a prime number q.
• The mod function is particularly useful in this case due to several of its
inherent properties:
[(x mod q) + (y mod q)] mod q = (x+y) mod q
(x mod q) mod q = x mod q
• For these reasons:
h(i)=((t[i] bM-1 mod q) +(t[i+1] bM-2 mod q) + … +(t[i+M-1] mod q))mod q
h(i+1) =( h(i) b mod q
Shift left one digit
-t[i] bM mod q
Subtract leftmost digit
+t[i+M] mod q )
Add new rightmost digit
mod q
13
Rabin-Karp Complexity
• If a sufficiently large prime number is used for the hash function,
the hashed values of two different patterns will usually be distinct.
• If this is the case, searching takes O(N) time, where N is the
number of characters in the larger body of text.
• It is always possible to construct a scenario with a worst case
complexity of O(MN). This, however, is likely to happen only if
the prime number used for hashing is small.

14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy