0% found this document useful (0 votes)
5 views6 pages

DSA String Matching - Part 3

The document outlines three string matching algorithms: Boyer-Moore, Brute-Force, and Aho-Corasick. The Boyer-Moore algorithm is efficient due to its use of heuristics to skip unnecessary comparisons, while the Brute-Force algorithm is a straightforward but less efficient approach. The Aho-Corasick algorithm is designed for searching multiple patterns simultaneously by building an automaton, significantly reducing search time.

Uploaded by

jaspinjose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

DSA String Matching - Part 3

The document outlines three string matching algorithms: Boyer-Moore, Brute-Force, and Aho-Corasick. The Boyer-Moore algorithm is efficient due to its use of heuristics to skip unnecessary comparisons, while the Brute-Force algorithm is a straightforward but less efficient approach. The Aho-Corasick algorithm is designed for searching multiple patterns simultaneously by building an automaton, significantly reducing search time.

Uploaded by

jaspinjose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

CS4301-DSA I Department of CSE 2025-2026

CS4301- DATA STRUCTURES AND ALGORITHMS I


CONTENT BEYOND THE SYLLABUS
STRING MATCHING

String Matching Algorithms


1) Boyer Moore Algorithm for Pattern Matching
2) Brute-Force String Search Algorithm
3) Aho-Corasick Algorithm for Pattern Searching

1) Boyer Moore Algorithm for Pattern Matching

The Boyer Moore Algorithm is used to determine whether a given pattern is present
within a specified text or not. It follows a backward approach for pattern searching/matching.
The task of searching a particular pattern within a given string is known as a pattern
searching problem. For example, if the text is "THIS IS A SAMPLE TEXT" and the pattern
is "TEXT", then the output should be 10, which is the index of the first occurrence of pattern
in the given text.
This algorithm was developed by Robert Boyer and J Strother Moore in 1977. It is
considered as the most efficient and widely used algorithm for pattern matching.

How does Boyer Moore Algorithm work?


In the previous chapters, we have seen the naive way to solve this problem which
involves sliding the pattern over the text one by one and comparing each character. However,
this approach is very slow, as it takes O(n*m) time, where 'n' is the length of the text and 'm'
is the length of the pattern. The Boyer Moore algorithm improves this by preprocessing the
pattern and using two heuristics to skip some comparisons that are not going to match.
The two heuristics are as follows −
 Bad character heuristic − This heuristic uses a table that stores the last occurrence of
each character in the pattern. When a mismatch occurs at some character(bad
character) in the text, the algorithm checks if this character appears in the pattern. If it
does, then it shifts the pattern such that the last occurrence of this character in the

1
CS4301-DSA I Department of CSE 2025-2026

pattern aligns with the bad character in the text. If it does not, then it shifts the pattern
past the bad character.
 Good suffix heuristic − This heuristic uses another table that stores shift information
when the bad heuristic fails. In this case, we look within the pattern till bad character
become good suffix of the text. Then we shift onward to find the given pattern.
 Example:

The Boyer-Moore algorithm combines these two heuristics by choosing the maximum
shift suggested by them at each step. In this procedure, the substring or pattern is searched
from the last character of the pattern. When a substring of the main string matches with a
substring of the pattern, it moves to find other occurrences of the matched substring. If there
is a mismatch, it applies the heuristics and shifts the pattern accordingly. The algorithm stops
when it finds a complete match or when it reaches the end of the text.
The Boyer-Moore algorithm has a worst-case time complexity of O(nm), but, it can
perform much better than that. In fact, in some cases, it can achieve a sublinear time
complexity of O(n/m), which means that it can skip some characters in the text without
comparing them. This happens when the pattern has no repeated characters or when it has a
large alphabet size.

To illustrate how the Boyer-Moore algorithm works, let's consider an example −


Input:
main String: "AABAAABCEDBABCDDEBC" and pattern: "ABC"
Output:
Pattern found at position: 5

2
CS4301-DSA I Department of CSE 2025-2026

Pattern found at position: 11


2. Brute-Force String Search Algorithm

Brute-Force or Naive String Search algorithm searches for a string (also called pattern)
within larger string.
It checks for character matches of pattern at each index of string.
If all characters of pattern match with string then search stops.
If not, it shifts to the next index of string for check.
It has worst case complexity of O(mn). Where m is length of pattern and n is length of
string.

A brute force algorithm is a straight forward approach to solving a problem. It also refers to a
programming style that does not include any shortcuts to improve performance.
 It is based on trial and error where the programmer tries to merely utilize the
computer's fast processing power to solve a problem, rather than applying some
advanced algorithms and techniques developed with human intelligence.
 It might increase both space and time complexity.
 A simple example of applying brute force would be linearly searching for an element
in an array. When each and every element of an array is compared with the data to be
searched, it might be termed as a brute force approach, as it is the most direct and
simple way one could think of searching the given data in the array

Brute Force Pattern Matching Algorithm


1. Start at the beginning of the text and slide the pattern window over it.
2. At each position of the text, compare the characters in the pattern with the characters
in the text.
3. If a mismatch is found, move the pattern window one position to the right in the text.
4. Repeat steps 2 and 3 until the pattern window reaches the end of the text.
5. If a match is found (all characters in the pattern match the corresponding characters in
the text), record the starting position of the match.
6. Move the pattern window one position to the right in the text and repeat steps 2-5.
7. Continue this process until the pattern window reaches the end of the text.

3
CS4301-DSA I Department of CSE 2025-2026

Example:

Pseudo-code
Explain
function bruteForcePatternMatch(T, P):
n = length(T)
m = length(P)

for i from 0 to n - m:
j=0
while j < m and T[i + j] == P[j]:
j=j+1
if j == m:
return i // Pattern found at position i
return -1 // Pattern not found

4
CS4301-DSA I Department of CSE 2025-2026

3) Aho-Corasick Algorithm for Pattern Searching

 This algorithm is Proposed in 1975, by Alfred Aho and Margaret Corasick, Aho-
Corasick Algorithm is considered to be a much more efficient approach while
searching for a number of strings in the given text.
 If we go with the naive approach of pattern searching, that is, by using the KMP
algorithm it'll take a much longer time. And that is one of the things your interviewer
wouldn't like! So, instead of searching for each pattern one by one, we do a little bit of
complex programming by building an automaton of all the given words.
 Since this algorithm helps to minimize the time taken during pattern-matching, it is
also a sort of Dictionary-matching Algorithm. This algorithm works in 3 phases:
1. Go-To
2. Failure
3. Output

Example:
In this portion, we'll make you understand how the Aho-Corasick Algorithm works for a
particular string and gives patterns. So, let's get started.
1) Preprocessing: This step happens before any of the given stages of the algorithm
and is very important for the smooth functioning of this pattern-matching
algorithm.
First, Build a trie of all words given which are to be found in the given string.

5
CS4301-DSA I Department of CSE 2025-2026

Second, extend the trie into an automaton so that time complexity can be reduced to linear.

2) Go-To: After building the tree, now we move on to the first phase of pattern-matching.
We observe all the characters present in the trie, and if there is any character that does not
have an edge at root, we add an edge back to its root.
3) Failure: For each state, using the Breadth First Traversal, we try to find the longest
proper suffix of the given string.

Output: For a particular state, indices of all words are stored in a bitwise map, to ease the
retrieval process.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy