You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/string/z-function.md
+35-31Lines changed: 35 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -49,31 +49,31 @@ To obtain an efficient algorithm we will compute the values of $z[i]$ in turn fr
49
49
50
50
For the sake of brevity, let's call **segment matches** those substrings that coincide with a prefix of $s$. For example, the value of the desired Z-function $z[i]$ is the length of the segment match starting at position $i$ (and that ends at position $i + z[i] - 1$).
51
51
52
-
To do this, we will keep **the $[l, r]$ indices of the rightmost segment match**. That is, among all detected segments we will keep the one that ends rightmost. In a way, the index $r$ can be seen as the "boundary" to which our string $s$ has been scanned by the algorithm; everything beyond that point is not yet known.
52
+
To do this, we will keep **the $[l, r)$ indices of the rightmost segment match**. That is, among all detected segments we will keep the one that ends rightmost. In a way, the index $r$ can be seen as the "boundary" to which our string $s$ has been scanned by the algorithm; everything beyond that point is not yet known.
53
53
54
54
Then, if the current index (for which we have to compute the next value of the Z-function) is $i$, we have one of two options:
55
55
56
-
* $i > r$ -- the current position is **outside** of what we have already processed.
56
+
* $i \geq r$ -- the current position is **outside** of what we have already processed.
57
57
58
-
We will then compute $z[i]$ with the **trivial algorithm** (that is, just comparing values one by one). Note that in the end, if $z[i] > 0$, we'll have to update the indices of the rightmost segment, because it's guaranteed that the new $r = i + z[i] - 1$ is better than the previous $r$.
58
+
We will then compute $z[i]$ with the **trivial algorithm** (that is, just comparing values one by one). Note that in the end, if $z[i] > 0$, we'll have to update the indices of the rightmost segment, because it's guaranteed that the new $r = i + z[i]$ is better than the previous $r$.
59
59
60
-
* $i \le r$ -- the current position is inside the current segment match $[l, r]$.
60
+
* $i < r$ -- the current position is inside the current segment match $[l, r)$.
61
61
62
62
Then we can use the already calculated Z-values to "initialize" the value of $z[i]$ to something (it sure is better than "starting from zero"), maybe even some big number.
63
63
64
-
For this, we observe that the substrings $s[l \dots r]$ and $s[0 \dots r-l]$ **match**. This means that as an initial approximation for $z[i]$ we can take the value already computed for the corresponding segment $s[0 \dots r-l]$, and that is $z[i-l]$.
64
+
For this, we observe that the substrings $s[l \dots r)$ and $s[0 \dots r-l)$ **match**. This means that as an initial approximation for $z[i]$ we can take the value already computed for the corresponding segment $s[0 \dots r-l)$, and that is $z[i-l]$.
65
65
66
66
However, the value $z[i-l]$ could be too large: when applied to position $i$ it could exceed the index $r$. This is not allowed because we know nothing about the characters to the right of $r$: they may differ from those required.
67
67
68
68
Here is **an example** of a similar scenario:
69
69
70
70
$$ s = "aaaabaa" $$
71
71
72
-
When we get to the last position ($i = 6$), the current match segment will be $[5, 6]$. Position $6$ will then match position $6 - 5 = 1$, for which the value of the Z-function is $z[1] = 3$. Obviously, we cannot initialize $z[6]$ to $3$, it would be completely incorrect. The maximum value we could initialize it to is $1$ -- because it's the largest value that doesn't bring us beyond the index $r$ of the match segment $[l, r]$.
72
+
When we get to the last position ($i = 6$), the current match segment will be $[5, 7)$. Position $6$ will then match position $6 - 5 = 1$, for which the value of the Z-function is $z[1] = 3$. Obviously, we cannot initialize $z[6]$ to $3$, it would be completely incorrect. The maximum value we could initialize it to is $1$ -- because it's the largest value that doesn't bring us beyond the index $r$ of the match segment $[l, r)$.
73
73
74
74
Thus, as an **initial approximation** for $z[i]$ we can safely take:
75
75
76
-
$$ z_0[i] = \min(r - i + 1,\; z[i-l]) $$
76
+
$$ z_0[i] = \min(r - i,\; z[i-l]) $$
77
77
78
78
After having $z[i]$ initialized to $z_0[i]$, we try to increment $z[i]$ by running the **trivial algorithm** -- because in general, after the border $r$, we cannot know if the segment will continue to match or not.
79
79
@@ -83,36 +83,40 @@ The algorithm turns out to be very simple. Despite the fact that on each iterati
83
83
84
84
## Implementation
85
85
86
-
Implementation turns out to be rather laconic:
86
+
Implementation turns out to be rather concise:
87
87
88
88
```cpp
89
89
vector<int> z_function(string s) {
90
-
int n = (int) s.length();
91
-
vector<int> z(n);
92
-
int l = 0, r = 0;
93
-
for (int i = 1; i < n; ++i) {
94
-
if (i <= r)
95
-
z[i] = min (r - i + 1, z[i - l]);
96
-
while (i + z[i] < n && s[z[i]] == s[i + z[i]])
97
-
++z[i];
98
-
if (i + z[i] - 1 > r)
99
-
l = i, r = i + z[i] - 1;
90
+
int n = s.size();
91
+
vector<int> z(n);
92
+
int l = 0, r = 0;
93
+
for(int i = 1; i < n; i++) {
94
+
if(i < r) {
95
+
z[i] = min(r - i, z[i - l]);
100
96
}
101
-
return z;
97
+
while(s[z[i]] == s[i + z[i]]) {
98
+
z[i]++;
99
+
}
100
+
if(i + z[i] > r) {
101
+
l = i;
102
+
r = i + z[i];
103
+
}
104
+
}
105
+
return z;
102
106
}
103
107
```
104
108
105
109
### Comments on this implementation
106
110
107
111
The whole solution is given as a function which returns an array of length $n$ -- the Z-function of $s$.
108
112
109
-
Array $z$ is initially filled with zeros. The current rightmost match segment is assumed to be $[0; 0]$ (that is, a deliberately small segment which doesn't contain any $i$).
113
+
Array $z$ is initially filled with zeros. The current rightmost match segment is assumed to be $[0; 0)$ (that is, a deliberately small segment which doesn't contain any $i$).
110
114
111
115
Inside the loop for $i = 1 \dots n - 1$ we first determine the initial value $z[i]$ -- it will either remain zero or be computed using the above formula.
112
116
113
117
Thereafter, the trivial algorithm attempts to increase the value of $z[i]$ as much as possible.
114
118
115
-
In the end, if it's required (that is, if $i + z[i]- 1 > r$), we update the rightmost match segment $[l, r]$.
119
+
In the end, if it's required (that is, if $i + z[i] > r$), we update the rightmost match segment $[l, r)$.
116
120
117
121
## Asymptotic behavior of the algorithm
118
122
@@ -126,29 +130,29 @@ We will show that **each iteration** of the `while` loop will increase the right
126
130
127
131
To do that, we will consider both branches of the algorithm:
128
132
129
-
* $i > r$
133
+
* $i \geq r$
130
134
131
135
In this case, either the `while` loop won't make any iteration (if $s[0] \ne s[i]$), or it will take a few iterations, starting at position $i$, each time moving one character to the right. After that, the right border $r$ will necessarily be updated.
132
136
133
-
So we have found that, when $i > r$, each iteration of the `while` loop increases the value of the new $r$ index.
137
+
So we have found that, when $i \geq r$, each iteration of the `while` loop increases the value of the new $r$ index.
134
138
135
-
* $i \le r$
139
+
* $i < r$
136
140
137
-
In this case, we initialize $z[i]$ to a certain value $z_0$ given by the above formula. Let's compare this initial value $z_0$ to the value $r - i + 1$. We will have three cases:
141
+
In this case, we initialize $z[i]$ to a certain value $z_0$ given by the above formula. Let's compare this initial value $z_0$ to the value $r - i$. We will have three cases:
138
142
139
-
* $z_0 < r - i + 1$
143
+
* $z_0 < r - i$
140
144
141
145
We prove that in this case no iteration of the `while` loop will take place.
142
146
143
-
It's easy to prove, for example, by contradiction: if the `while` loop made at least one iteration, it would mean that initial approximation $z[i] = z_0$ was inaccurate (less than the match's actual length). But since $s[l \dots r]$ and $s[0 \dots r-l]$ are the same, this would imply that $z[i-l]$ holds the wrong value (less than it should be).
147
+
It's easy to prove, for example, by contradiction: if the `while` loop made at least one iteration, it would mean that initial approximation $z[i] = z_0$ was inaccurate (less than the match's actual length). But since $s[l \dots r)$ and $s[0 \dots r-l)$ are the same, this would imply that $z[i-l]$ holds the wrong value (less than it should be).
144
148
145
-
Thus, since $z[i-l]$ is correct and it is less than $r - i + 1$, it follows that this value coincides with the required value $z[i]$.
149
+
Thus, since $z[i-l]$ is correct and it is less than $r - i$, it follows that this value coincides with the required value $z[i]$.
146
150
147
-
* $z_0 = r - i + 1$
151
+
* $z_0 = r - i$
148
152
149
-
In this case, the `while` loop can make a few iterations, but each of them will lead to an increase in the value of the $r$ index because we will start comparing from $s[r+1]$, which will climb beyond the $[l, r]$ interval.
153
+
In this case, the `while` loop can make a few iterations, but each of them will lead to an increase in the value of the $r$ index because we will start comparing from $s[r]$, which will climb beyond the $[l, r)$ interval.
150
154
151
-
* $z_0 > r - i + 1$
155
+
* $z_0 > r - i$
152
156
153
157
This option is impossible, by definition of $z_0$.
0 commit comments