BNP Unit-5 Lecture 20 KMP 5.2
BNP Unit-5 Lecture 20 KMP 5.2
(KCS-503)
Unit-5
String matching
Course Outline:-
⮚ Knuth-Morris-Pratt String Matching
The Knuth-Morris-Pratt algorithm
We now present a linear-time string-matching algorithm due to Knuth, Morris, and
Pratt. Their algorithm achieves a Θ (n + m) running time by avoiding the
computation of the transition function δ altogether, and it does the pattern matching
using just an auxiliary function π[1 . . m] precomputed from the pattern in time
O(m). The array π allows the transition function δ to be computed efficiently (in an
amortized sense) "on the fly" as needed. Roughly speaking, for any state q = 0,
1, . . . , m, and any character a ϵ Σ, the value π [q] contains the information that is
independent of a and is needed to compute δ (q, a). Since the array π has only m
entries, where as δ has O(m | Σ |) entries, we save a factor of Σ in the preprocessing
by computing π rather than δ.
Components of KMP Algorithm
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates
knowledge about how the pattern matches against the shift of itself. This
information can be used to avoid a useless shift of the pattern 'p.' In other words,
this enables avoiding backtracking of the string 'S.'
2.The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs,
find the occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which
occurrences are found.
The Prefix Function (Π)
COMPUTE- PREFIX- FUNCTION (P)
1. m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Running Time Analysis:
In the above pseudo code for calculating the prefix function, the for loop
from step 4 to step 10 runs 'm' times. Step1 to Step3 take constant time.
Hence the running time of computing prefix function is O (m).
Numericals
Q2: Compute the prefix function for the pattern ababaca when the
alphabet is = {a, b, c}.
The KMP Matcher
KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next match
Running Time Analysis
The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the
string 'S.' Since step 1 to step 4 take constant times, the running time is dominated
by this for the loop. Thus running time of the matching function is O (n).
Pattern 'P' has been found to complexity occur in a string 'T.' The
total number of shifts that took place for the match to be found is
i-m = 13 - 7 = 6 shifts.
The End
B N Pandey 7/5/2020