|
| 1 | +<!--?title Prefix function. Knuth–Morris–Pratt algorithm --> |
| 2 | + |
| 3 | +# Prefix function. Knuth–Morris–Pratt algorithm |
| 4 | + |
| 5 | +## Prefix function definition |
| 6 | + |
| 7 | +You are given a string $s$ of length $n$ $s[0..n-1]$. **Prefix function** for this string is defined as an array $\pi$ |
| 8 | +of length $n$ $\pi[0..n-1]$, where $\pi[i]$ is the length of the longest proper prefix of a substring $s[0...i]$ |
| 9 | +which is also a suffix of this substring. A proper prefix of a string is a prefix that is not equal to the string itself. |
| 10 | +By definition, $\pi[0] = 0$. |
| 11 | + |
| 12 | +Mathematically the definition of the prefix function can be written as follows: |
| 13 | + |
| 14 | +$$\pi[i] = \max_ {k = 0..i} \\{k : s[0..k-1] = s[i-(k-1)..i] \\}$$ |
| 15 | + |
| 16 | +For example, prefix function of string "abcabcd" is $[0, 0, 0, 1, 2, 3, 0]$, and prefix function of string "aabaaab" is $[0, 1, 0, 1, 2, 2, 3]$. |
| 17 | + |
| 18 | +## Trivial Algorithm |
| 19 | + |
| 20 | +An algorithm which follows the definition of prefix function exactly is the following: |
| 21 | + |
| 22 | +```cpp |
| 23 | +vector<int> prefix_function (string s) { |
| 24 | + int n = (int) s.length(); |
| 25 | + vector<int> pi (n); |
| 26 | + for (int i=0; i<n; ++i) |
| 27 | + for (int k=0; k<=i; ++k) |
| 28 | + if (s.substr(0,k) == s.substr(i-k+1,k)) |
| 29 | + pi[i] = k; |
| 30 | + return pi; |
| 31 | +} |
| 32 | +``` |
| 33 | +
|
| 34 | +It is easy to see that its complexity is $O(n^3)$, which has room for improvement. |
| 35 | +
|
| 36 | +## Efficient Algorithm |
| 37 | +
|
| 38 | +This algorithm was proposed by Knuth and Pratt and independently from them by Morris in 1977. |
| 39 | +It was used as the main element of a substring search algorithm. |
| 40 | +
|
| 41 | +### First Optimization |
| 42 | +
|
| 43 | +### Second Optimization |
| 44 | +
|
| 45 | +### Final Algorithm |
| 46 | +
|
| 47 | +### Implementation |
| 48 | +
|
| 49 | +The implementation ends up being surprisingly short and expressive. |
| 50 | +
|
| 51 | +```cpp |
| 52 | +vector<int> prefix_function (string s) { |
| 53 | + int n = (int) s.length(); |
| 54 | + vector<int> pi (n); |
| 55 | + for (int i=1; i<n; ++i) { |
| 56 | + int j = pi[i-1]; |
| 57 | + while (j > 0 && s[i] != s[j]) |
| 58 | + j = pi[j-1]; |
| 59 | + if (s[i] == s[j]) ++j; |
| 60 | + pi[i] = j; |
| 61 | + } |
| 62 | + return pi; |
| 63 | +} |
| 64 | +``` |
| 65 | + |
| 66 | +This is an *online* algorithm, i.e. it processes the data as it arrives - for example, you can read the string |
| 67 | +characters one by one and process them immediately, finding the value of prefix function for each next character. |
| 68 | +The algorithm still requires storing the string itself and the previously calculated values of prefix function, |
| 69 | +but if we know beforehand the maximum value $M$ the prefix function can take on the string, we can store only $M+1$ |
| 70 | +first characters of the string and the same number of values of the prefix function. |
| 71 | + |
| 72 | +## Practice Problems |
| 73 | + |
| 74 | +* [UVA # 455 "Periodic Strings"](http://uva.onlinejudge.org/index.php?option=onlinejudge&page=show_problem&problem=396) |
| 75 | +* [UVA # 11022 "String Factoring"](http://uva.onlinejudge.org/index.php?option=onlinejudge&page=show_problem&problem=1963) |
| 76 | +* [UVA # 11452 "Dancing the Cheeky-Cheeky"](http://uva.onlinejudge.org/index.php?option=onlinejudge&page=show_problem&problem=2447) |
0 commit comments