Skip to content

Commit 260ec53

Browse files
committed
Translate article "Lyndon decomposition"
1 parent 2d1d267 commit 260ec53

File tree

4 files changed

+141
-1
lines changed

4 files changed

+141
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ e-maxx-eng
55
[![Pull Requests](https://img.shields.io/github/issues-pr/e-maxx-eng/e-maxx-eng.svg)](https://github.com/e-maxx-eng/e-maxx-eng/pulls)
66
[![Closed Pull Requests](https://img.shields.io/github/issues-pr-closed/e-maxx-eng/e-maxx-eng.svg)](https://github.com/e-maxx-eng/e-maxx-eng/pulls?q=is%3Apr+is%3Aclosed)
77
[![Build Status](https://travis-ci.org/e-maxx-eng/e-maxx-eng.svg?branch=master)](https://travis-ci.org/e-maxx-eng/e-maxx-eng)
8-
[![Translation Progress](https://img.shields.io/badge/translation_progress-60.3%25-yellowgreen.svg)](https://github.com/e-maxx-eng/e-maxx-eng/wiki/Translation-Progress)
8+
[![Translation Progress](https://img.shields.io/badge/translation_progress-62.1%25-yellowgreen.svg)](https://github.com/e-maxx-eng/e-maxx-eng/wiki/Translation-Progress)
99

1010
Translation of http://e-maxx.ru into English
1111

src/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ especially popular in field of competitive programming.*
5050
- [Z-function](./string/z-function.html)
5151
- [Prefix function](./string/prefix-function.html)
5252
- [Finding all sub-palindromes in O(N)](./string/manacher.html)
53+
- [Lyndon factorization](./string/lyndon_factorization.html)
5354

5455
### Linear Algebra
5556
- [Gauss & System of Linear Equations](./linear_algebra/linear-system-gauss.html)

src/string/lyndon_factorization.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
<!--?title Lyndon factorization -->
2+
# Lyndon factorization
3+
4+
## Lyndon factorization
5+
6+
First let us define the notion of the Lyndon factorization.
7+
8+
A string is called **simple** (or a Lyndon word), if it is strictly **smaller than** any of its own nontrivial **suffixes**.
9+
Examples of simple strings are: $a$, $b$, $ab$, $aab$, $abb$, $ababb$, $abcd$.
10+
It can be shown that a string is simple, if and only if it is strictly **smaller than** all its nontrivial **cyclic shifts**.
11+
12+
Next, let there be a given string $s$.
13+
The **Lyndon factorization** of the string $s$ is a factorization $s = w_1 w_2 \dots w_k$, where all strings $w_i$ are simple, and they are in non-increasing order $w_1 \ge w_2 \ge \dots \ge w_k$.
14+
15+
It can be shown, that for any string such a factorization exists and that it is unique.
16+
17+
## Duval algorithm
18+
19+
The Duval algorithm constructs the Lyndon factorization in $O(n)$ time using $O(1)$ additional memory.
20+
21+
First let us introduce another notion:
22+
a string $t$ is called **pre-simple**, if it has the form $t = w w \dots w \overline{w}$, where $w$ is a simple string and $\overline{w}$ is a prefix of $w$ (possibly empty).
23+
A simple string is also pre-simple.
24+
25+
The Duval algorithm is greedy.
26+
At any point during its execution, the string $s$ will actually be divided into three strings $s = s_1 s_2 s_3$, where the Lyndon factorization for $s_1$ is already found and finalized, the string $s_2$ is pre-simple (and we know the length of the simple string in it), and $s_3$ is completely untouched.
27+
In each iteration the Duval algorithm takes the first character of the string $s_3$ and tries to append it to the string $s_2$.
28+
It $s_2$ is no longer pre-simple, then the Lyndon factorization for some part of $s_2$ becomes known, and this part goes to $s_1$.
29+
30+
Let's describe the algorithm in more detail.
31+
The pointer $i$ will always point to the beginning of the string $s_2$.
32+
The outer loop will be executed as long as $i < n$.
33+
Inside the loop we use two additional pointers, $j$ which points to the beginning of $s_3$, and $k$ which points to the current character that we are currently comparing to.
34+
We want to add the character $s[j]$ to the string $s_2$, which requires a comparison with the character $s[k]$.
35+
There can be three different cases:
36+
37+
- $s[j] = s[k]$: if this is the case, then adding the symbol $s[j]$ to $s_2$ doesn't violate its pre-simplicity.
38+
So we simply increment the pointers $j$ and $k$.
39+
- $s[j] > s[k]$: here, the string $s_2 + s[j]$ becomes simple.
40+
We can increment $j$ and reset $k$ back to the beginning of $s_2$, so that the next character can be compared with the beginning of of the simple word.
41+
- $s[j] < s[k]$: the string $s_2 + s[j]$ is no longer pre-simple.
42+
Therefore we will split the pre-simple string $s_2$ into its simple strings and the remainder, possibly empty.
43+
The simple string will have the length $j - k$.
44+
In the next iteration we start again with the remaining $s_2$.
45+
46+
### Implementation
47+
48+
Here we present the implementation of the Duval algorithm, which will return the desired Lyndon factorization of a given string $s$.
49+
50+
```cpp duval_algorithm
51+
vector<string> duval(string const& s) {
52+
int n = s.size();
53+
int i = 0;
54+
vector<string> factorization;
55+
while (i < n) {
56+
int j = i + 1, k = i;
57+
while (j < n && s[k] <= s[j]) {
58+
if (s[k] < s[j])
59+
k = i;
60+
else
61+
k++;
62+
j++;
63+
}
64+
while (i <= k) {
65+
factorization.push_back(s.substr(i, j - k));
66+
i += j - k;
67+
}
68+
}
69+
return factorization;
70+
}
71+
```
72+
73+
### Complexity
74+
75+
Let us estimate the running time of this algorithm.
76+
77+
The **outer while loop** does not exceed $n$ iterations, since at the end of each iteration $i$ increases.
78+
Also the second inner while loop runs in $O(n)$, since is only outputs the final factorization.
79+
80+
So we are only interested in the **first inner while loop**.
81+
How many iterations does it perform in the worst case?
82+
It's easy to see that the simple words that we identify in each iteration of the outer loop are longer than the remainder that we additionally compared.
83+
Therefore also the sum of the remainders will be smaller than $n$, which means that we only perform at most $O(n)$ iterations of the first inner while loop.
84+
In fact the total number of character comparisons will not exceed $4n - 3$.
85+
86+
## Finding the smallest cyclic shift
87+
88+
Let there be a string $s$.
89+
We construct the Lyndon factorization for the string $s + s$ (in $O(n)$ time).
90+
We will look for a simple string in the factorization, which starts at a position less than $n$ (i.e. it starts in the first instance of $s$), and ends in a position greater than or equal to $n$ (i.e. in the second instance) of $s$).
91+
It is stated, that the position of the start of this simple string will be the beginning of the desired smallest cyclic shift.
92+
This can be easily verified using the definition of the Lyndon decomposition.
93+
94+
The beginning of the simple block can be found easily - just remember the pointer $i$ at the beginning of each iteration of the outer loop, which indicated the beginning of the current pre-simple string.
95+
96+
So we get the following implementation:
97+
98+
```cpp smallest_cyclic_string
99+
string min_cyclic_string(string s) {
100+
s += s;
101+
int n = s.size();
102+
int i = 0, ans = 0;
103+
while (i < n / 2) {
104+
ans = i;
105+
int j = i + 1, k = i;
106+
while (j < n && s[k] <= s[j]) {
107+
if (s[k] < s[j])
108+
k = i;
109+
else
110+
k++;
111+
j++;
112+
}
113+
while (i <= k)
114+
i += j - k;
115+
}
116+
return s.substr(ans, n / 2);
117+
}
118+
```
119+
120+
## Problems
121+
122+
- [UVA #719 - Glass Beads](https://uva.onlinejudge.org/index.php?option=onlinejudge&page=show_problem&problem=660)

test/test_lyndon_factorization.cpp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#include <cassert>
2+
#include <vector>
3+
#include <string>
4+
using namespace std;
5+
6+
#include "duval_algorithm.h"
7+
#include "smallest_cyclic_string.h"
8+
9+
int main() {
10+
string s = "abacabab";
11+
vector<string> exp_fact = {"abac", "ab", "ab"};
12+
vector<string> fact = duval(s);
13+
assert(exp_fact == fact);
14+
15+
string exp_min_shift = "abababac";
16+
assert(exp_min_shift == min_cyclic_string(s));
17+
}

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy