Skip to content

Commit 8b99a59

Browse files
author
Oleksandr Kulkov
authored
use half-intervals, clearer implementation
i + z[i] < n no longer needed, as s[s.size()] = 0
1 parent 833638a commit 8b99a59

File tree

1 file changed

+35
-31
lines changed

1 file changed

+35
-31
lines changed

src/string/z-function.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -49,31 +49,31 @@ To obtain an efficient algorithm we will compute the values of $z[i]$ in turn fr
4949
5050
For the sake of brevity, let's call **segment matches** those substrings that coincide with a prefix of $s$. For example, the value of the desired Z-function $z[i]$ is the length of the segment match starting at position $i$ (and that ends at position $i + z[i] - 1$).
5151
52-
To do this, we will keep **the $[l, r]$ indices of the rightmost segment match**. That is, among all detected segments we will keep the one that ends rightmost. In a way, the index $r$ can be seen as the "boundary" to which our string $s$ has been scanned by the algorithm; everything beyond that point is not yet known.
52+
To do this, we will keep **the $[l, r)$ indices of the rightmost segment match**. That is, among all detected segments we will keep the one that ends rightmost. In a way, the index $r$ can be seen as the "boundary" to which our string $s$ has been scanned by the algorithm; everything beyond that point is not yet known.
5353
5454
Then, if the current index (for which we have to compute the next value of the Z-function) is $i$, we have one of two options:
5555
56-
* $i > r$ -- the current position is **outside** of what we have already processed.
56+
* $i \geq r$ -- the current position is **outside** of what we have already processed.
5757
58-
We will then compute $z[i]$ with the **trivial algorithm** (that is, just comparing values one by one). Note that in the end, if $z[i] > 0$, we'll have to update the indices of the rightmost segment, because it's guaranteed that the new $r = i + z[i] - 1$ is better than the previous $r$.
58+
We will then compute $z[i]$ with the **trivial algorithm** (that is, just comparing values one by one). Note that in the end, if $z[i] > 0$, we'll have to update the indices of the rightmost segment, because it's guaranteed that the new $r = i + z[i]$ is better than the previous $r$.
5959
60-
* $i \le r$ -- the current position is inside the current segment match $[l, r]$.
60+
* $i < r$ -- the current position is inside the current segment match $[l, r)$.
6161
6262
Then we can use the already calculated Z-values to "initialize" the value of $z[i]$ to something (it sure is better than "starting from zero"), maybe even some big number.
6363
64-
For this, we observe that the substrings $s[l \dots r]$ and $s[0 \dots r-l]$ **match**. This means that as an initial approximation for $z[i]$ we can take the value already computed for the corresponding segment $s[0 \dots r-l]$, and that is $z[i-l]$.
64+
For this, we observe that the substrings $s[l \dots r)$ and $s[0 \dots r-l)$ **match**. This means that as an initial approximation for $z[i]$ we can take the value already computed for the corresponding segment $s[0 \dots r-l)$, and that is $z[i-l]$.
6565
6666
However, the value $z[i-l]$ could be too large: when applied to position $i$ it could exceed the index $r$. This is not allowed because we know nothing about the characters to the right of $r$: they may differ from those required.
6767
6868
Here is **an example** of a similar scenario:
6969
7070
$$ s = "aaaabaa" $$
7171
72-
When we get to the last position ($i = 6$), the current match segment will be $[5, 6]$. Position $6$ will then match position $6 - 5 = 1$, for which the value of the Z-function is $z[1] = 3$. Obviously, we cannot initialize $z[6]$ to $3$, it would be completely incorrect. The maximum value we could initialize it to is $1$ -- because it's the largest value that doesn't bring us beyond the index $r$ of the match segment $[l, r]$.
72+
When we get to the last position ($i = 6$), the current match segment will be $[5, 7)$. Position $6$ will then match position $6 - 5 = 1$, for which the value of the Z-function is $z[1] = 3$. Obviously, we cannot initialize $z[6]$ to $3$, it would be completely incorrect. The maximum value we could initialize it to is $1$ -- because it's the largest value that doesn't bring us beyond the index $r$ of the match segment $[l, r)$.
7373
7474
Thus, as an **initial approximation** for $z[i]$ we can safely take:
7575
76-
$$ z_0[i] = \min(r - i + 1,\; z[i-l]) $$
76+
$$ z_0[i] = \min(r - i,\; z[i-l]) $$
7777
7878
After having $z[i]$ initialized to $z_0[i]$, we try to increment $z[i]$ by running the **trivial algorithm** -- because in general, after the border $r$, we cannot know if the segment will continue to match or not.
7979
@@ -83,36 +83,40 @@ The algorithm turns out to be very simple. Despite the fact that on each iterati
8383
8484
## Implementation
8585
86-
Implementation turns out to be rather laconic:
86+
Implementation turns out to be rather concise:
8787
8888
```cpp
8989
vector<int> z_function(string s) {
90-
int n = (int) s.length();
91-
vector<int> z(n);
92-
int l = 0, r = 0;
93-
for (int i = 1; i < n; ++i) {
94-
if (i <= r)
95-
z[i] = min (r - i + 1, z[i - l]);
96-
while (i + z[i] < n && s[z[i]] == s[i + z[i]])
97-
++z[i];
98-
if (i + z[i] - 1 > r)
99-
l = i, r = i + z[i] - 1;
90+
int n = s.size();
91+
vector<int> z(n);
92+
int l = 0, r = 0;
93+
for(int i = 1; i < n; i++) {
94+
if(i < r) {
95+
z[i] = min(r - i, z[i - l]);
10096
}
101-
return z;
97+
while(s[z[i]] == s[i + z[i]]) {
98+
z[i]++;
99+
}
100+
if(i + z[i] > r) {
101+
l = i;
102+
r = i + z[i];
103+
}
104+
}
105+
return z;
102106
}
103107
```
104108

105109
### Comments on this implementation
106110

107111
The whole solution is given as a function which returns an array of length $n$ -- the Z-function of $s$.
108112

109-
Array $z$ is initially filled with zeros. The current rightmost match segment is assumed to be $[0; 0]$ (that is, a deliberately small segment which doesn't contain any $i$).
113+
Array $z$ is initially filled with zeros. The current rightmost match segment is assumed to be $[0; 0)$ (that is, a deliberately small segment which doesn't contain any $i$).
110114

111115
Inside the loop for $i = 1 \dots n - 1$ we first determine the initial value $z[i]$ -- it will either remain zero or be computed using the above formula.
112116

113117
Thereafter, the trivial algorithm attempts to increase the value of $z[i]$ as much as possible.
114118

115-
In the end, if it's required (that is, if $i + z[i] - 1 > r$), we update the rightmost match segment $[l, r]$.
119+
In the end, if it's required (that is, if $i + z[i] > r$), we update the rightmost match segment $[l, r)$.
116120

117121
## Asymptotic behavior of the algorithm
118122

@@ -126,29 +130,29 @@ We will show that **each iteration** of the `while` loop will increase the right
126130

127131
To do that, we will consider both branches of the algorithm:
128132

129-
* $i > r$
133+
* $i \geq r$
130134

131135
In this case, either the `while` loop won't make any iteration (if $s[0] \ne s[i]$), or it will take a few iterations, starting at position $i$, each time moving one character to the right. After that, the right border $r$ will necessarily be updated.
132136

133-
So we have found that, when $i > r$, each iteration of the `while` loop increases the value of the new $r$ index.
137+
So we have found that, when $i \geq r$, each iteration of the `while` loop increases the value of the new $r$ index.
134138

135-
* $i \le r$
139+
* $i < r$
136140

137-
In this case, we initialize $z[i]$ to a certain value $z_0$ given by the above formula. Let's compare this initial value $z_0$ to the value $r - i + 1$. We will have three cases:
141+
In this case, we initialize $z[i]$ to a certain value $z_0$ given by the above formula. Let's compare this initial value $z_0$ to the value $r - i$. We will have three cases:
138142

139-
* $z_0 < r - i + 1$
143+
* $z_0 < r - i$
140144

141145
We prove that in this case no iteration of the `while` loop will take place.
142146

143-
It's easy to prove, for example, by contradiction: if the `while` loop made at least one iteration, it would mean that initial approximation $z[i] = z_0$ was inaccurate (less than the match's actual length). But since $s[l \dots r]$ and $s[0 \dots r-l]$ are the same, this would imply that $z[i-l]$ holds the wrong value (less than it should be).
147+
It's easy to prove, for example, by contradiction: if the `while` loop made at least one iteration, it would mean that initial approximation $z[i] = z_0$ was inaccurate (less than the match's actual length). But since $s[l \dots r)$ and $s[0 \dots r-l)$ are the same, this would imply that $z[i-l]$ holds the wrong value (less than it should be).
144148

145-
Thus, since $z[i-l]$ is correct and it is less than $r - i + 1$, it follows that this value coincides with the required value $z[i]$.
149+
Thus, since $z[i-l]$ is correct and it is less than $r - i$, it follows that this value coincides with the required value $z[i]$.
146150

147-
* $z_0 = r - i + 1$
151+
* $z_0 = r - i$
148152

149-
In this case, the `while` loop can make a few iterations, but each of them will lead to an increase in the value of the $r$ index because we will start comparing from $s[r+1]$, which will climb beyond the $[l, r]$ interval.
153+
In this case, the `while` loop can make a few iterations, but each of them will lead to an increase in the value of the $r$ index because we will start comparing from $s[r]$, which will climb beyond the $[l, r)$ interval.
150154

151-
* $z_0 > r - i + 1$
155+
* $z_0 > r - i$
152156

153157
This option is impossible, by definition of $z_0$.
154158

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy