You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/string/suffix-automaton.md
+39-1Lines changed: 39 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -193,7 +193,7 @@ Before proceeding to the algorithm itself, we recap the accumulated knowledge, a
193
193
- For each state $v$ one or multiple substrings match.
194
194
We denote by $longest(v)$ the longest such string, and through $len(v)$ its length.
195
195
We denote by $shortest(v)$ the shortest such substring, and its length with $minlen(v)$.
196
-
Then all the strings corresponding to this state are different suffixes of the string $longest(v)$ and have all possible lengths in the interval $[minlength(v); len(v)]$.
196
+
Then all the strings corresponding to this state are different suffixes of the string $longest(v)$ and have all possible lengths in the interval $[minlen(v); len(v)]$.
197
197
- For each state $v \ne t_0$ a suffix link is defined as a link, that leads to a state that corresponds to the suffix of the string $longest(v)$ of length $minlen(v) - 1$.
198
198
The suffix links form a tree with the root in $t_0$, and at the same time this tree forms an inclusion relationship between the sets $endpos$.
199
199
- We can express $minlen(v)$ for $v \ne t_0$ using the suffix link $link(v)$ as:
@@ -494,6 +494,24 @@ The number of different substrings is the value $d[t_0] - 1$ (since we don't cou
494
494
495
495
Total time complexity: $O(length(S))$
496
496
497
+
498
+
Alternatively, we can take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$.
499
+
Therefore, given $minlen(v) = 1 + len(link(v))$, we have total distinct substrings at state $v$ being $len(v) - minlen(v) + 1 = len(v) - (1 + len(link(v))) + 1 = len(v) - len(link(v))$.
500
+
501
+
This is demonstrated succinctly below:
502
+
503
+
```cpp
504
+
long long get_diff_strings(){
505
+
long long tot = 0;
506
+
for(int i = 1; i < sz; i++) {
507
+
tot += st[i].len - st[st[i].link].len;
508
+
}
509
+
return tot;
510
+
}
511
+
```
512
+
513
+
While this is also $O(length(S))$, it requires no extra space and no recursive calls, consequently running faster in practice.
514
+
497
515
### Total length of all different substrings
498
516
499
517
Given a string $S$.
@@ -511,6 +529,26 @@ We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since ever
511
529
512
530
Again this task can be computed in $O(length(S))$ time.
513
531
532
+
Alternatively, we can, again, take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$.
533
+
Since $minlen(v) = 1 + len(link(v))$ and the arithmetic series formula $S_n = n \cdot \frac{a_1+a_n}{2}$ (where $S_n$ denotes the sum of $n$ terms, $a_1$ representing the first term, and $a_n$ representing the last), we can compute the length of substrings at a state in constant time. We then sum up these totals for each state $v \neq t_0$ in the automaton. This is shown by the code below:
534
+
535
+
```cpp
536
+
longlongget_tot_len_diff_substings() {
537
+
long long tot = 0;
538
+
for(int i = 1; i < sz; i++) {
539
+
long long shortest = st[st[i].link].len + 1;
540
+
long long longest = st[i].len;
541
+
542
+
long long num_strings = longest - shortest + 1;
543
+
long long cur = num_strings * (longest + shortest) / 2;
544
+
tot += cur;
545
+
}
546
+
return tot;
547
+
}
548
+
```
549
+
550
+
This approaches runs in $O(length(S))$ time, but experimentally runs 20x faster than the memoized dynamic programming version on randomized strings. It requires no extra space and no recursion.
0 commit comments