From fbbbfee6d28d7dca2eeb447dd2fe60d9257f87a9 Mon Sep 17 00:00:00 2001 From: Atharva Thorve Date: Thu, 7 Nov 2024 12:48:34 -0500 Subject: [PATCH 1/3] Fixed spelling mistake in suffix-automaton.md --- src/string/suffix-automaton.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index 445b7ecb3..c75b2bca0 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -221,7 +221,7 @@ Let us describe this process: (Initially we set $last = 0$, and we will change $last$ in the last step of the algorithm accordingly.) - Create a new state $cur$, and assign it with $len(cur) = len(last) + 1$. The value $link(cur)$ is not known at the time. - - Now we to the following procedure: + - Now we do the following procedure: We start at the state $last$. While there isn't a transition through the letter $c$, we will add a transition to the state $cur$, and follow the suffix link. If at some point there already exists a transition through the letter $c$, then we will stop and denote this state with $p$. From 6b2c74beb362ca425834f1264208434f5a584b14 Mon Sep 17 00:00:00 2001 From: Atharva Thorve Date: Thu, 7 Nov 2024 14:48:08 -0500 Subject: [PATCH 2/3] Fixed another mistake in the algorithm section --- src/string/suffix-automaton.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index c75b2bca0..3644b4350 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -225,7 +225,7 @@ Let us describe this process: We start at the state $last$. While there isn't a transition through the letter $c$, we will add a transition to the state $cur$, and follow the suffix link. If at some point there already exists a transition through the letter $c$, then we will stop and denote this state with $p$. - - If it haven't found such a state $p$, then we reached the fictitious state $-1$, then we can just assign $link(cur) = 0$ and leave. + - If we haven't found such a state $p$, then we reached the fictitious state $-1$, then we can just assign $link(cur) = 0$ and leave. - Suppose now that we have found a state $p$, from which there exists a transition through the letter $c$. We will denote the state, to which the transition leads, with $q$. - Now we have two cases. Either $len(p) + 1 = len(q)$, or not. From a736d00add106bf933040b6c803a2065d9768cf1 Mon Sep 17 00:00:00 2001 From: Atharva Thorve Date: Thu, 7 Nov 2024 15:42:21 -0500 Subject: [PATCH 3/3] More fixes in suffix-automaton.md --- src/string/suffix-automaton.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/src/string/suffix-automaton.md b/src/string/suffix-automaton.md index 3644b4350..47626684e 100644 --- a/src/string/suffix-automaton.md +++ b/src/string/suffix-automaton.md @@ -241,7 +241,7 @@ Let us describe this process: - In any of the three cases, after completing the procedure, we update the value $last$ with the state $cur$. -If we also want to know which states are **terminal** and which are not, the we can find all terminal states after constructing the complete suffix automaton for the entire string $s$. +If we also want to know which states are **terminal** and which are not, we can find all terminal states after constructing the complete suffix automaton for the entire string $s$. To do this, we take the state corresponding to the entire string (stored in the variable $last$), and follow its suffix links until we reach the initial state. We will mark all visited states as terminal. It is easy to understand that by doing so we will mark exactly the states corresponding to all the suffixes of the string $s$, which are exactly the terminal states. @@ -280,7 +280,7 @@ The linearity of the number of transitions, and in general the linearity of the - In the second case we came across an existing transition $(p, q)$. This means that we tried to add a string $x + c$ (where $x$ is a suffix of $s$) to the machine that **already exists** in the machine (the string $x + c$ already appears as a substring of $s$). - Since we assume that the automaton for the string $s$ is build correctly, we should not add a new transition here. + Since we assume that the automaton for the string $s$ is built correctly, we should not add a new transition here. However there is a difficulty. To which state should the suffix link from the state $cur$ lead? @@ -324,7 +324,7 @@ If we consider all parts of the algorithm, then it contains three places in the - The second place is the copying of transitions when the state $q$ is cloned into a new state $clone$. - Third place is changing the transition leading to $q$, redirecting them to $clone$. -We use the fact that the size of the suffix automaton (both in number of states and in the number of transitions) is **linear**. +We use the fact that the size of the suffix automaton (both in the number of states and in the number of transitions) is **linear**. (The proof of the linearity of the number of states is the algorithm itself, and the proof of linearity of the number of states is given below, after the implementation of the algorithm). Thus the total complexity of the **first and second places** is obvious, after all each operation adds only one amortized new transition to the automaton. @@ -334,7 +334,7 @@ We denote $v = longest(p)$. This is a suffix of the string $s$, and with each iteration its length decreases - and therefore the position $v$ as the suffix of the string $s$ increases monotonically with each iteration. In this case, if before the first iteration of the loop, the corresponding string $v$ was at the depth $k$ ($k \ge 2$) from $last$ (by counting the depth as the number of suffix links), then after the last iteration the string $v + c$ will be a $2$-th suffix link on the path from $cur$ (which will become the new value $last$). -Thus, each iteration of this loop leads to the fact that the position of the string $longest(link(link(last))$ as suffix of the current string will monotonically increase. +Thus, each iteration of this loop leads to the fact that the position of the string $longest(link(link(last))$ as a suffix of the current string will monotonically increase. Therefore this cycle cannot be executed more than $n$ iterations, which was required to prove. ### Implementation @@ -444,7 +444,7 @@ Let the current non-continuous transition be $(p, q)$ with the character $c$. We take the correspondent string $u + c + w$, where the string $u$ corresponds to the longest path from the initial state to $p$, and $w$ to the longest path from $q$ to any terminal state. On one hand, each such string $u + c + w$ for each incomplete strings will be different (since the strings $u$ and $w$ are formed only by complete transitions). On the other hand each such string $u + c + w$, by the definition of the terminal states, will be a suffix of the entire string $s$. -Since there are only $n$ non-empty suffixes of $s$, and non of the strings $u + c + w$ can contain $s$ (because the entire string only contains complete transitions), the total number of incomplete transitions does not exceed $n - 1$. +Since there are only $n$ non-empty suffixes of $s$, and none of the strings $u + c + w$ can contain $s$ (because the entire string only contains complete transitions), the total number of incomplete transitions does not exceed $n - 1$. Combining these two estimates gives us the bound $3n - 3$. However, since the maximum number of states can only be achieved with the test case $\text{"abbb\dots bbb"}$ and this case has clearly less than $3n - 3$ transitions, we get the tighter bound of $3n - 4$ for the number of transitions in a suffix automaton. @@ -460,7 +460,7 @@ For the simplicity we assume that the alphabet size $k$ is constant, which allow ### Check for occurrence -Given a text $T$, and multiple patters $P$. +Given a text $T$, and multiple patterns $P$. We have to check whether or not the strings $P$ appear as a substring of $T$. We build a suffix automaton of the text $T$ in $O(length(T))$ time. @@ -525,7 +525,7 @@ The value $ans[v]$ can be computed using the recursion: $$ans[v] = \sum_{w : (v, w, c) \in DAWG} d[w] + ans[w]$$ -We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since every substrings is one character longer when starting from the state $v$). +We take the answer of each adjacent vertex $w$, and add to it $d[w]$ (since every substring is one character longer when starting from the state $v$). Again this task can be computed in $O(length(S))$ time. @@ -547,7 +547,7 @@ long long get_tot_len_diff_substings() { } ``` -This approaches runs in $O(length(S))$ time, but experimentally runs 20x faster than the memoized dynamic programming version on randomized strings. It requires no extra space and no recursion. +This approach runs in $O(length(S))$ time, but experimentally runs 20x faster than the memoized dynamic programming version on randomized strings. It requires no extra space and no recursion. ### Lexicographically $k$-th substring {data-toc-label="Lexicographically k-th substring"} @@ -555,7 +555,7 @@ Given a string $S$. We have to answer multiple queries. For each given number $K_i$ we have to find the $K_i$-th string in the lexicographically ordered list of all substrings. -The solution of this problem is based on the idea of the previous two problems. +The solution to this problem is based on the idea of the previous two problems. The lexicographically $k$-th substring corresponds to the lexicographically $k$-th path in the suffix automaton. Therefore after counting the number of paths from each state, we can easily search for the $k$-th path starting from the root of the automaton. @@ -577,7 +577,7 @@ Total time complexity is $O(length(S))$. For a given text $T$. We have to answer multiple queries. -For each given pattern $P$ we have to find out how many times the string $P$ appears in the string $T$ as substring. +For each given pattern $P$ we have to find out how many times the string $P$ appears in the string $T$ as a substring. We construct the suffix automaton for the text $T$. @@ -603,7 +603,7 @@ Therefore initially we have $cnt = 1$ for each such state, and $cnt = 0$ for all Then we apply the following operation for each $v$: $cnt[link(v)] \text{ += } cnt[v]$. The meaning behind this is, that if a string $v$ appears $cnt[v]$ times, then also all its suffixes appear at the exact same end positions, therefore also $cnt[v]$ times. -Why don't we overcount in this procedure (i.e. don't count some position twice)? +Why don't we overcount in this procedure (i.e. don't count some positions twice)? Because we add the positions of a state to only one other state, so it can not happen that one state directs its positions to another state twice in two different ways. Thus we can compute the quantities $cnt$ for all states in the automaton in $O(length(T))$ time. @@ -690,7 +690,7 @@ void output_all_occurrences(int v, int P_length) { ### Shortest non-appearing string Given a string $S$ and a certain alphabet. -We have to find a string of smallest length, that doesn't appear in $S$. +We have to find a string of the smallest length, that doesn't appear in $S$. We will apply dynamic programming on the suffix automaton built for the string $S$. @@ -706,7 +706,7 @@ The answer to the problem will be $d[t_0]$, and the actual string can be restore ### Longest common substring of two strings Given two strings $S$ and $T$. -We have to find the longest common substring, i.e. such a string $X$ that appears as substring in $S$ and also in $T$. +We have to find the longest common substring, i.e. such a string $X$ that appears as a substring in $S$ and also in $T$. We construct a suffix automaton for the string $S$. pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy