DSA String Matching - Part 3
DSA String Matching - Part 3
The Boyer Moore Algorithm is used to determine whether a given pattern is present
within a specified text or not. It follows a backward approach for pattern searching/matching.
The task of searching a particular pattern within a given string is known as a pattern
searching problem. For example, if the text is "THIS IS A SAMPLE TEXT" and the pattern
is "TEXT", then the output should be 10, which is the index of the first occurrence of pattern
in the given text.
This algorithm was developed by Robert Boyer and J Strother Moore in 1977. It is
considered as the most efficient and widely used algorithm for pattern matching.
1
CS4301-DSA I Department of CSE 2025-2026
pattern aligns with the bad character in the text. If it does not, then it shifts the pattern
past the bad character.
Good suffix heuristic − This heuristic uses another table that stores shift information
when the bad heuristic fails. In this case, we look within the pattern till bad character
become good suffix of the text. Then we shift onward to find the given pattern.
Example:
The Boyer-Moore algorithm combines these two heuristics by choosing the maximum
shift suggested by them at each step. In this procedure, the substring or pattern is searched
from the last character of the pattern. When a substring of the main string matches with a
substring of the pattern, it moves to find other occurrences of the matched substring. If there
is a mismatch, it applies the heuristics and shifts the pattern accordingly. The algorithm stops
when it finds a complete match or when it reaches the end of the text.
The Boyer-Moore algorithm has a worst-case time complexity of O(nm), but, it can
perform much better than that. In fact, in some cases, it can achieve a sublinear time
complexity of O(n/m), which means that it can skip some characters in the text without
comparing them. This happens when the pattern has no repeated characters or when it has a
large alphabet size.
2
CS4301-DSA I Department of CSE 2025-2026
Brute-Force or Naive String Search algorithm searches for a string (also called pattern)
within larger string.
It checks for character matches of pattern at each index of string.
If all characters of pattern match with string then search stops.
If not, it shifts to the next index of string for check.
It has worst case complexity of O(mn). Where m is length of pattern and n is length of
string.
A brute force algorithm is a straight forward approach to solving a problem. It also refers to a
programming style that does not include any shortcuts to improve performance.
It is based on trial and error where the programmer tries to merely utilize the
computer's fast processing power to solve a problem, rather than applying some
advanced algorithms and techniques developed with human intelligence.
It might increase both space and time complexity.
A simple example of applying brute force would be linearly searching for an element
in an array. When each and every element of an array is compared with the data to be
searched, it might be termed as a brute force approach, as it is the most direct and
simple way one could think of searching the given data in the array
3
CS4301-DSA I Department of CSE 2025-2026
Example:
Pseudo-code
Explain
function bruteForcePatternMatch(T, P):
n = length(T)
m = length(P)
for i from 0 to n - m:
j=0
while j < m and T[i + j] == P[j]:
j=j+1
if j == m:
return i // Pattern found at position i
return -1 // Pattern not found
4
CS4301-DSA I Department of CSE 2025-2026
This algorithm is Proposed in 1975, by Alfred Aho and Margaret Corasick, Aho-
Corasick Algorithm is considered to be a much more efficient approach while
searching for a number of strings in the given text.
If we go with the naive approach of pattern searching, that is, by using the KMP
algorithm it'll take a much longer time. And that is one of the things your interviewer
wouldn't like! So, instead of searching for each pattern one by one, we do a little bit of
complex programming by building an automaton of all the given words.
Since this algorithm helps to minimize the time taken during pattern-matching, it is
also a sort of Dictionary-matching Algorithm. This algorithm works in 3 phases:
1. Go-To
2. Failure
3. Output
Example:
In this portion, we'll make you understand how the Aho-Corasick Algorithm works for a
particular string and gives patterns. So, let's get started.
1) Preprocessing: This step happens before any of the given stages of the algorithm
and is very important for the smooth functioning of this pattern-matching
algorithm.
First, Build a trie of all words given which are to be found in the given string.
5
CS4301-DSA I Department of CSE 2025-2026
Second, extend the trie into an automaton so that time complexity can be reduced to linear.
2) Go-To: After building the tree, now we move on to the first phase of pattern-matching.
We observe all the characters present in the trie, and if there is any character that does not
have an edge at root, we add an edge back to its root.
3) Failure: For each state, using the Breadth First Traversal, we try to find the longest
proper suffix of the given string.
Output: For a particular state, indices of all words are stored in a bitwise map, to ease the
retrieval process.