Boyer
Boyer
The Boyer-Moore algorithm is an efficient string-matching algorithm that finds the occurrences of a
pattern P of length mm in a text T of length n. It is based on heuristics that allow it to skip sections of the
text, making it much faster in practice compared to naive string-matching algorithms.
Key Features
1. Preprocessing Phase:
• Constructs two tables (Bad Character Rule and Good Suffix Rule) to optimize text-
skipping.
2. Matching Phase:
• Compares the pattern P with the text T starting from the rightmost character of the
pattern.
3. Efficiency:
Key Heuristics
• Definition:
• When a mismatch occurs at position i, shift the pattern so that the mismatched
character in the text aligns with its last occurrence in the pattern.
• If the character is not in the pattern, skip the entire pattern past the mismatched
character.
• Key Insight:
• Example:
• Text: ABCDABCD
• Pattern: ABCD
• Shift: Align C in the text with its last occurrence in the pattern.
• Definition:
• After a mismatch, shift the pattern so that the matched suffix aligns with another
occurrence of the same suffix in the pattern, or with a prefix of the pattern.
• Key Insight:
• Utilize the already matched portion of the pattern to skip unnecessary comparisons.
• Example:
• Text: ABCDABCD
• Pattern: BCDAB
• Matched suffix: AB
• Shift: Align the suffix AB with its next occurrence in the pattern.
Algorithm Steps
1. Preprocessing Phase:
2. Matching Phase:
• On a mismatch:
• Compute the shifts using the Bad Character Rule and Good Suffix Rule.
• On a complete match:
Advantages
Disadvantages
1. Preprocessing overhead.