Skip to content

Commit e291a7e

Browse files
authored
Merge pull request #1082 from mhayter/patch-5
Quick edit + Resolved issue #924.
2 parents e9b906f + af01de8 commit e291a7e

File tree

1 file changed

+39
-1
lines changed

1 file changed

+39
-1
lines changed

src/string/suffix-automaton.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ Before proceeding to the algorithm itself, we recap the accumulated knowledge, a
193193
- For each state $v$ one or multiple substrings match.
194194
We denote by $longest(v)$ the longest such string, and through $len(v)$ its length.
195195
We denote by $shortest(v)$ the shortest such substring, and its length with $minlen(v)$.
196-
Then all the strings corresponding to this state are different suffixes of the string $longest(v)$ and have all possible lengths in the interval $[minlength(v); len(v)]$.
196+
Then all the strings corresponding to this state are different suffixes of the string $longest(v)$ and have all possible lengths in the interval $[minlen(v); len(v)]$.
197197
- For each state $v \ne t_0$ a suffix link is defined as a link, that leads to a state that corresponds to the suffix of the string $longest(v)$ of length $minlen(v) - 1$.
198198
The suffix links form a tree with the root in $t_0$, and at the same time this tree forms an inclusion relationship between the sets $endpos$.
199199
- We can express $minlen(v)$ for $v \ne t_0$ using the suffix link $link(v)$ as:
@@ -494,6 +494,24 @@ The number of different substrings is the value $d[t_0] - 1$ (since we don't cou
494494
495495
Total time complexity: $O(length(S))$
496496
497+
498+
Alternatively, we can take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$.
499+
Therefore, given $minlen(v) = 1 + len(link(v))$, we have total distinct substrings at state $v$ being $len(v) - minlen(v) + 1 = len(v) - (1 + len(link(v))) + 1 = len(v) - len(link(v))$.
500+
501+
This is demonstrated succinctly below:
502+
503+
```cpp
504+
long long get_diff_strings(){
505+
long long tot = 0;
506+
for(int i = 1; i < sz; i++) {
507+
tot += st[i].len - st[st[i].link].len;
508+
}
509+
return tot;
510+
}
511+
```
512+
513+
While this is also $O(length(S))$, it requires no extra space and no recursive calls, consequently running faster in practice.
514+
497515
### Total length of all different substrings
498516

499517
Given a string $S$.
@@ -511,6 +529,26 @@ We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since ever
511529

512530
Again this task can be computed in $O(length(S))$ time.
513531

532+
Alternatively, we can, again, take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$.
533+
Since $minlen(v) = 1 + len(link(v))$ and the arithmetic series formula $S_n = n \cdot \frac{a_1+a_n}{2}$ (where $S_n$ denotes the sum of $n$ terms, $a_1$ representing the first term, and $a_n$ representing the last), we can compute the length of substrings at a state in constant time. We then sum up these totals for each state $v \neq t_0$ in the automaton. This is shown by the code below:
534+
535+
```cpp
536+
long long get_tot_len_diff_substings() {
537+
long long tot = 0;
538+
for(int i = 1; i < sz; i++) {
539+
long long shortest = st[st[i].link].len + 1;
540+
long long longest = st[i].len;
541+
542+
long long num_strings = longest - shortest + 1;
543+
long long cur = num_strings * (longest + shortest) / 2;
544+
tot += cur;
545+
}
546+
return tot;
547+
}
548+
```
549+
550+
This approaches runs in $O(length(S))$ time, but experimentally runs 20x faster than the memoized dynamic programming version on randomized strings. It requires no extra space and no recursion.
551+
514552
### Lexicographically $k$-th substring {data-toc-label="Lexicographically k-th substring"}
515553

516554
Given a string $S$.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy