From 4a1113386930220be9066f3ac4e6e35bc3a07452 Mon Sep 17 00:00:00 2001 From: Michael Hayter Date: Sun, 30 Apr 2023 07:33:19 -0400 Subject: [PATCH 1/6] Quick edit + partially addressing Issue #924. Change minlength(v) to minlen(v) for consistency. Issue #924 improvement. Will complete after feedback! --- src/string/suffix-automaton.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index 2ae0771c4..5037ab906 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -193,7 +193,7 @@ Before proceeding to the algorithm itself, we recap the accumulated knowledge, a - For each state $v$ one or multiple substrings match. We denote by $longest(v)$ the longest such string, and through $len(v)$ its length. We denote by $shortest(v)$ the shortest such substring, and its length with $minlen(v)$. - Then all the strings corresponding to this state are different suffixes of the string $longest(v)$ and have all possible lengths in the interval $[minlength(v); len(v)]$. + Then all the strings corresponding to this state are different suffixes of the string $longest(v)$ and have all possible lengths in the interval $[minlen(v); len(v)]$. - For each state $v \ne t_0$ a suffix link is defined as a link, that leads to a state that corresponds to the suffix of the string $longest(v)$ of length $minlen(v) - 1$. The suffix links form a tree with the root in $t_0$, and at the same time this tree forms an inclusion relationship between the sets $endpos$. - We can express $minlen(v)$ for $v \ne t_0$ using the suffix link $link(v)$ as: @@ -494,6 +494,21 @@ The number of different substrings is the value $d[t_0] - 1$ (since we don't cou Total time complexity: $O(length(S))$ + +Alternatively, we can take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$. +Therefore, given $minlen(v) = 1 + len(link(v))$, we have total distinct substrings at state $v$ being $len(v) - minlen(v) + 1 = len(v) - 1 + len(link(v)) + 1 = len(v) - len(link(v))$. + +This is demonstrated succinctly below: + +```cpp +long long tot{}; +for(int i=1;i Date: Sun, 30 Apr 2023 21:01:18 -0400 Subject: [PATCH 2/6] Forgot parenthesis around minlen(v) expansion --- src/string/suffix-automaton.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index 5037ab906..554a7d627 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -496,7 +496,7 @@ Total time complexity: $O(length(S))$ Alternatively, we can take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$. -Therefore, given $minlen(v) = 1 + len(link(v))$, we have total distinct substrings at state $v$ being $len(v) - minlen(v) + 1 = len(v) - 1 + len(link(v)) + 1 = len(v) - len(link(v))$. +Therefore, given $minlen(v) = 1 + len(link(v))$, we have total distinct substrings at state $v$ being $len(v) - minlen(v) + 1 = len(v) - (1 + len(link(v))) + 1 = len(v) - len(link(v))$. This is demonstrated succinctly below: From da025de7bda78c323c317cc9e74b71334b21233b Mon Sep 17 00:00:00 2001 From: Michael Hayter Date: Sun, 30 Apr 2023 22:08:28 -0400 Subject: [PATCH 3/6] Updated c++ code to work on current style of suffix automaton This is tested on current suffix automaton. Also, made more modular --- src/string/suffix-automaton.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index 554a7d627..de4ae289a 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -501,9 +501,12 @@ Therefore, given $minlen(v) = 1 + len(link(v))$, we have total distinct substrin This is demonstrated succinctly below: ```cpp -long long tot{}; -for(int i=1;i Date: Tue, 2 May 2023 02:00:18 -0400 Subject: [PATCH 4/6] complete issue #924 I've now resolved issue #924. I added new techniques to calculating the total length of different strings. --- src/string/suffix-automaton.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index de4ae289a..46ec5db19 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -510,7 +510,7 @@ long long get_diff_strings(){ } ``` -While this is also $O(length(S))$, it requires no extra space besides (what's used for the suffix automaton construction) and no recursive calls, consequently running faster in practice. +While this is also $O(length(S))$, it requires no extra space and no recursive calls, consequently running faster in practice. ### Total length of all different substrings @@ -529,6 +529,26 @@ We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since ever Again this task can be computed in $O(length(S))$ time. +Alternatively, we can, again, take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$. +Since $minlen(v) = 1 + len(link(v))$ and the arimetic series formula $S[n] = n * (a[1]+a[n]) / 2$ (where $S[n]$ denotes the sum of $n$ terms,$a[1]$ representing the first term, and $a[n]$ representing the last), we can compute the length of substrings at a state in constant time. We then sum up these totals for each state $v \neq t[0]$ in the automaton. This is shown by the code below: + +```cpp +long long get_tot_len_diff_substings() { + long long tot{}; + for(int i=1;i Date: Tue, 2 May 2023 02:15:45 -0400 Subject: [PATCH 5/6] notation rendering changes notation changes as preview wasn't working. --- src/string/suffix-automaton.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index 46ec5db19..ec7ace253 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -530,7 +530,7 @@ We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since ever Again this task can be computed in $O(length(S))$ time. Alternatively, we can, again, take advantage of the fact that each state $v$ matches to substrings of length $[minlen(v),len(v)]$. -Since $minlen(v) = 1 + len(link(v))$ and the arimetic series formula $S[n] = n * (a[1]+a[n]) / 2$ (where $S[n]$ denotes the sum of $n$ terms,$a[1]$ representing the first term, and $a[n]$ representing the last), we can compute the length of substrings at a state in constant time. We then sum up these totals for each state $v \neq t[0]$ in the automaton. This is shown by the code below: +Since $minlen(v) = 1 + len(link(v))$ and the arithmetic series formula $S_n = n * (a_1+a_n) / 2$ (where $S_n$ denotes the sum of $n$ terms, $a_1$ representing the first term, and $a_n$ representing the last), we can compute the length of substrings at a state in constant time. We then sum up these totals for each state $v \neq t_0$ in the automaton. This is shown by the code below: ```cpp long long get_tot_len_diff_substings() { From af01de8daa0356a4d1b391362d0f4bbc203bb604 Mon Sep 17 00:00:00 2001 From: Jakob Kogler Date: Sun, 7 May 2023 11:55:15 +0200 Subject: [PATCH 6/6] fix some formatting --- src/string/suffix-automaton.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index ec7ace253..07afdbe5d 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -502,9 +502,9 @@ This is demonstrated succinctly below: ```cpp long long get_diff_strings(){ - ll tot{}; - for(int i=1;i pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy