Skip to content

Commit 10d5822

Browse files
committed
Doc: add a little about LACON execution to src/backend/regex/README.
I wrote this while thinking about a possible optimization, but it's a useful description of the existing code regardless of whether the optimization ever happens. So push it separately.
1 parent 375aed3 commit 10d5822

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

src/backend/regex/README

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,3 +438,36 @@ BOS/BOL/EOS/EOL adjacent to the pre-state and post-state. So a finished
438438
NFA for a pattern without anchors or adjacent-character constraints will
439439
have pre-state outarcs for RAINBOW (all possible character colors) as well
440440
as BOS and BOL, and likewise post-state inarcs for RAINBOW, EOS, and EOL.
441+
Also note that LACON arcs will never connect to the pre-state
442+
or post-state.
443+
444+
445+
Look-around constraints (LACONs)
446+
--------------------------------
447+
448+
The regex compiler doesn't have much intelligence about LACONs; it just
449+
constructs a sub-NFA representing the pattern that the constraint says to
450+
match or not match, and puts a LACON arc referencing that sub-NFA into the
451+
main NFA. At runtime, the executor applies the sub-NFA at each point in
452+
the string where the constraint is relevant, and then traverses or doesn't
453+
traverse the arc. ("Traversal" means including the arc's to-state in the
454+
set of NFA states that are considered active at the next character.)
455+
456+
The actual basic matching cycle of the executor is
457+
1. Identify the color of the next input character, then advance over it.
458+
2. Apply the DFA to follow all the matching "plain" arcs of the NFA.
459+
(Notionally, the previous DFA state represents the set of states the
460+
NFA could have been in before the character, and the new DFA state
461+
represents the set of states the NFA could be in after the character.)
462+
3. If there are any LACON arcs leading out of any of the new NFA states,
463+
apply each LACON constraint starting from the new next input character
464+
(while not actually consuming any input). For each successful LACON,
465+
add its to-state to the current set of NFA states. If any such
466+
to-state has outgoing LACON arcs, process those in the same way.
467+
(Mathematically speaking, we compute the transitive closure of the
468+
set of states reachable by successful LACONs.)
469+
470+
Thus, LACONs are always checked immediately after consuming a character
471+
via a plain arc. This is okay because the NFA's "pre" state only has
472+
plain out-arcs, so we can always consume a character (possibly a BOS
473+
pseudo-character as described above) before we need to worry about LACONs.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy