0% found this document useful (0 votes)
79 views5 pages

Xpbctbxabpqxctbpg Abxab: The Boyer-Moore Algorithm Right-To-Left Scan

The Boyer-Moore string matching algorithm achieves sublinear run time on average by using three main ideas: the right-to-left scan, bad character rule, and good suffix rule. It works by shifting the pattern string P to the right by more than one character when a mismatch occurs, using these rules to determine the size of the shift. The bad character rule and good suffix rule allow for larger shifts on average compared to always shifting by one.

Uploaded by

A'ch Réf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views5 pages

Xpbctbxabpqxctbpg Abxab: The Boyer-Moore Algorithm Right-To-Left Scan

The Boyer-Moore string matching algorithm achieves sublinear run time on average by using three main ideas: the right-to-left scan, bad character rule, and good suffix rule. It works by shifting the pattern string P to the right by more than one character when a mismatch occurs, using these rules to determine the size of the shift. The bad character rule and good suffix rule allow for larger shifts on average compared to always shifting by one.

Uploaded by

A'ch Réf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

The Boyer-Moore Algorithm

Right-to-left scan

The string matching algorithm of choice.

12345678901234567
xpbctbxabpqxctbpg
tpabxab

Achieves expected sublinear run time using three


ideas:
Right-to-left scan
Bad character rule
Good suffix rule

Boyer-Moore

When a mismatch occurs, shift P right by some


amount.
If we always shift by one, run time is O(nm). Instead,
we try to shift by more than one, if possible.

Page 1

Definition. For each x in the alphabet, R(x) is the


index of the rightmost occurrence of x in P. R(x) is 0 if
x does not occur in P.
Example. = {a, b, c, d}

P = abadab

Boyer-Moore

Page 2

The bad character shift rule


Suppose that for some alignment of P against T, the
rightmost n i characters of P are matched, but P[i]
mismatches. Assume the mismatch is with T[k].
Then, shift P right by max[1, i R(T[k])] places.

x a b c d
R(x) 5 6 0 4

bacbcdabaaba
abadab
abadab
abadab
abadab

Boyer-Moore

Page 3

Boyer-Moore

Shift by 2
Shift by 3
Shift by 1

Page 4

The extended bad character shift rule

The good suffix rule. Suppose that for some


alignment of P and T, substring t of T matches a suffix

When a mismatch occurs at P[i] and T[k] and T[k] = x,


shift P to the right so that the closest x in P to the left
of P[i] is aligned with T[k].

of P, but a mismatch occurs at the next position. Find


the rightmost copy t of t in P such that t is not a
suffix of P and the character to the left of t in P differs
from the character to the left of t in P. Shift P so that
t in P is aligned with t in T.

T
P

x
P

x
x

x
y

T
x

t
P

Boyer-Moore

Page 5

. . . If there is no such t, shift the left end of P past the


left end of t in T by the least amount so that a prefix
of the shifted pattern matches a suffix of t in T.

Boyer-Moore

t
P

t
y

t
Page 6

P
s

. . . If no such shift is possible, shift P by n places to


the right (i.e., past t).

Boyer-Moore

s
y

Page 7

Boyer-Moore

Page 8

If an occurrence of P is found, shift P by the least


amount so that a proper prefix of the shifted P matches

123456789012345678
prstabstubabvqxrst
*
qcabdabdab
1234567890

a suffix of the occurrence of P in T. If no such shift is


possible, shift P by n places to the right (i.e., past t).

123456789012345678
prstabstubabvqxrst
qcabdabdab
1234567890

Boyer-Moore

Page 9

Definition. For each i, L(i) is the largest position less


than n such that P[i.. n] matches a suffix of
P[1..L(i)]. L(i) is 0 if no such position exists.

L(i)

Boyer-Moore

Definition. For each i, L(i) is the largest position less


than n such that P[i..n] matches a suffix of
P[1..L(i)] and such that the character preceding that
suffix is not equal to P[i1]. L(i) is 0 if no such
position exists.

n
P

Boyer-Moore

Page 10

Page 11

Boyer-Moore

z
1

y
L(i)

y
i
n

Page 12

Definition. Nj(P) is the length of the longest suffix of


P[1..j] that is also a suffix of the full string P.

Example

Nj(P)
P

Observation. N(P) is the reverse of Z(P). That is,


Nj(P) = Znj+1(Pr), where Pr is the reversal of P.

Example
j
P[j]
Nj(P)

1 2 3 4 5 6 7 8 9
c a b d a b d a b
0 0 2 0 0 5 0 0 *

Boyer-Moore

j
P[j]
Nj(P)

1 2 3 4 5 6 7 8 9
c a b d a b d a b
0 0 2 0 0 5 0 0 *

Pr[j]
Zj(Pr)

b a d b a d b a c
* 0 0 5 0 0 2 0 0

Thus, N(P) can be computed in O(n) time.


Page 13

Theorem.
L(i) is the largest j such that Nj(P) |P[i..n]|.
L(i) is the largest j such that Nj(P)= |P[i..n]|.

Boyer-Moore

Page 14

Definition. l(i) is the length of the longest suffix of


P[i..n] that is also a prefix of P. If no such suffix
exists, l(i) = 0.
l(i)

Computing the L(i)s


for i 1 to n do
L(i) 0
for j 1 to n1 do
i nNj(P) + 1
L(i) j

Theorem. l(i) equals the largest j |P[i..n]| such


that Nj(P) = j.

Thus, L can be computed in O(n) time.


Boyer-Moore

Page 15

Boyer-Moore

Page 16

Using L and l for the good suffix rule


If a mismatch occurs at position i 1 of P, then
If L(i) > 0, shift P right by n L(i) places
If L(i) = 0, shift P right by n l(i) places
If an occurrence of P is found, then shift P by
n l(2) places.
If P[n] mismatches, shift P one place to the right.

Boyer-Moore

Page 17

With the strong good suffix rule alone, the worst-case


run time of Boyer-Moore is
O(m) if P is not in T (Knuth, Morris, Pratt 1977,
Guibas & Odlyzko 1980, Cole 1994)
O(nm) if P is in T, but can be modified to achieve
O(n+m) time in all cases (Galil 1979, Apostolico
and Giancarlo 1986)
With the bad character rule alone
Worst-case time is O(nm)
Expected time on random strings is sublinear
Sublinear time observed in practice
Boyer-Moore

Page 19

Boyer-Moore(P,T)
compute L(i) and L(i) for each position i of P
compute R(x) for each x in the alphabet
kn
while k m do
i n; h k
while i > 0 and P[i] = T[h] do
i--; h-if i = 0 then
report occurrence of P in T ending at T[k]
k k + n l(2)
else
shift P (increase k) by the maximum
amount determined by the bad character
rule and the good suffix rule
Boyer-Moore

Page 18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy