|
| 1 | +<!--?title Lyndon factorization --> |
| 2 | +# Lyndon factorization |
| 3 | + |
| 4 | +## Lyndon factorization |
| 5 | + |
| 6 | +First let us define the notion of the Lyndon factorization. |
| 7 | + |
| 8 | +A string is called **simple** (or a Lyndon word), if it is strictly **smaller than** any of its own nontrivial **suffixes**. |
| 9 | +Examples of simple strings are: $a$, $b$, $ab$, $aab$, $abb$, $ababb$, $abcd$. |
| 10 | +It can be shown that a string is simple, if and only if it is strictly **smaller than** all its nontrivial **cyclic shifts**. |
| 11 | + |
| 12 | +Next, let there be a given string $s$. |
| 13 | +The **Lyndon factorization** of the string $s$ is a factorization $s = w_1 w_2 \dots w_k$, where all strings $w_i$ are simple, and they are in non-increasing order $w_1 \ge w_2 \ge \dots \ge w_k$. |
| 14 | + |
| 15 | +It can be shown, that for any string such a factorization exists and that it is unique. |
| 16 | + |
| 17 | +## Duval algorithm |
| 18 | + |
| 19 | +The Duval algorithm constructs the Lyndon factorization in $O(n)$ time using $O(1)$ additional memory. |
| 20 | + |
| 21 | +First let us introduce another notion: |
| 22 | +a string $t$ is called **pre-simple**, if it has the form $t = w w \dots w \overline{w}$, where $w$ is a simple string and $\overline{w}$ is a prefix of $w$ (possibly empty). |
| 23 | +A simple string is also pre-simple. |
| 24 | + |
| 25 | +The Duval algorithm is greedy. |
| 26 | +At any point during its execution, the string $s$ will actually be divided into three strings $s = s_1 s_2 s_3$, where the Lyndon factorization for $s_1$ is already found and finalized, the string $s_2$ is pre-simple (and we know the length of the simple string in it), and $s_3$ is completely untouched. |
| 27 | +In each iteration the Duval algorithm takes the first character of the string $s_3$ and tries to append it to the string $s_2$. |
| 28 | +It $s_2$ is no longer pre-simple, then the Lyndon factorization for some part of $s_2$ becomes known, and this part goes to $s_1$. |
| 29 | + |
| 30 | +Let's describe the algorithm in more detail. |
| 31 | +The pointer $i$ will always point to the beginning of the string $s_2$. |
| 32 | +The outer loop will be executed as long as $i < n$. |
| 33 | +Inside the loop we use two additional pointers, $j$ which points to the beginning of $s_3$, and $k$ which points to the current character that we are currently comparing to. |
| 34 | +We want to add the character $s[j]$ to the string $s_2$, which requires a comparison with the character $s[k]$. |
| 35 | +There can be three different cases: |
| 36 | + |
| 37 | +- $s[j] = s[k]$: if this is the case, then adding the symbol $s[j]$ to $s_2$ doesn't violate its pre-simplicity. |
| 38 | + So we simply increment the pointers $j$ and $k$. |
| 39 | +- $s[j] > s[k]$: here, the string $s_2 + s[j]$ becomes simple. |
| 40 | + We can increment $j$ and reset $k$ back to the beginning of $s_2$, so that the next character can be compared with the beginning of of the simple word. |
| 41 | +- $s[j] < s[k]$: the string $s_2 + s[j]$ is no longer pre-simple. |
| 42 | + Therefore we will split the pre-simple string $s_2$ into its simple strings and the remainder, possibly empty. |
| 43 | + The simple string will have the length $j - k$. |
| 44 | + In the next iteration we start again with the remaining $s_2$. |
| 45 | + |
| 46 | +### Implementation |
| 47 | + |
| 48 | +Here we present the implementation of the Duval algorithm, which will return the desired Lyndon factorization of a given string $s$. |
| 49 | + |
| 50 | +```cpp duval_algorithm |
| 51 | +vector<string> duval(string const& s) { |
| 52 | + int n = s.size(); |
| 53 | + int i = 0; |
| 54 | + vector<string> factorization; |
| 55 | + while (i < n) { |
| 56 | + int j = i + 1, k = i; |
| 57 | + while (j < n && s[k] <= s[j]) { |
| 58 | + if (s[k] < s[j]) |
| 59 | + k = i; |
| 60 | + else |
| 61 | + k++; |
| 62 | + j++; |
| 63 | + } |
| 64 | + while (i <= k) { |
| 65 | + factorization.push_back(s.substr(i, j - k)); |
| 66 | + i += j - k; |
| 67 | + } |
| 68 | + } |
| 69 | + return factorization; |
| 70 | +} |
| 71 | +``` |
| 72 | +
|
| 73 | +### Complexity |
| 74 | +
|
| 75 | +Let us estimate the running time of this algorithm. |
| 76 | +
|
| 77 | +The **outer while loop** does not exceed $n$ iterations, since at the end of each iteration $i$ increases. |
| 78 | +Also the second inner while loop runs in $O(n)$, since is only outputs the final factorization. |
| 79 | +
|
| 80 | +So we are only interested in the **first inner while loop**. |
| 81 | +How many iterations does it perform in the worst case? |
| 82 | +It's easy to see that the simple words that we identify in each iteration of the outer loop are longer than the remainder that we additionally compared. |
| 83 | +Therefore also the sum of the remainders will be smaller than $n$, which means that we only perform at most $O(n)$ iterations of the first inner while loop. |
| 84 | +In fact the total number of character comparisons will not exceed $4n - 3$. |
| 85 | +
|
| 86 | +## Finding the smallest cyclic shift |
| 87 | +
|
| 88 | +Let there be a string $s$. |
| 89 | +We construct the Lyndon factorization for the string $s + s$ (in $O(n)$ time). |
| 90 | +We will look for a simple string in the factorization, which starts at a position less than $n$ (i.e. it starts in the first instance of $s$), and ends in a position greater than or equal to $n$ (i.e. in the second instance) of $s$). |
| 91 | +It is stated, that the position of the start of this simple string will be the beginning of the desired smallest cyclic shift. |
| 92 | +This can be easily verified using the definition of the Lyndon decomposition. |
| 93 | +
|
| 94 | +The beginning of the simple block can be found easily - just remember the pointer $i$ at the beginning of each iteration of the outer loop, which indicated the beginning of the current pre-simple string. |
| 95 | +
|
| 96 | +So we get the following implementation: |
| 97 | +
|
| 98 | +```cpp smallest_cyclic_string |
| 99 | +string min_cyclic_string(string s) { |
| 100 | + s += s; |
| 101 | + int n = s.size(); |
| 102 | + int i = 0, ans = 0; |
| 103 | + while (i < n / 2) { |
| 104 | + ans = i; |
| 105 | + int j = i + 1, k = i; |
| 106 | + while (j < n && s[k] <= s[j]) { |
| 107 | + if (s[k] < s[j]) |
| 108 | + k = i; |
| 109 | + else |
| 110 | + k++; |
| 111 | + j++; |
| 112 | + } |
| 113 | + while (i <= k) |
| 114 | + i += j - k; |
| 115 | + } |
| 116 | + return s.substr(ans, n / 2); |
| 117 | +} |
| 118 | +``` |
| 119 | + |
| 120 | +## Problems |
| 121 | + |
| 122 | +- [UVA #719 - Glass Beads](https://uva.onlinejudge.org/index.php?option=onlinejudge&page=show_problem&problem=660) |
0 commit comments