0% found this document useful (0 votes)
15 views14 pages

BNP Unit-5 Lecture 20 KMP 5.2

The document discusses the Knuth-Morris-Pratt (KMP) algorithm for string matching, which operates in linear time, Θ(n + m), by using a precomputed auxiliary function, π, instead of a transition function. It details the components of the KMP algorithm, including the Prefix Function and the KMP Matcher, along with their respective running time analyses. Additionally, it provides numerical examples for computing the prefix function for given patterns.

Uploaded by

aniketpsingh2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

BNP Unit-5 Lecture 20 KMP 5.2

The document discusses the Knuth-Morris-Pratt (KMP) algorithm for string matching, which operates in linear time, Θ(n + m), by using a precomputed auxiliary function, π, instead of a transition function. It details the components of the KMP algorithm, including the Prefix Function and the KMP Matcher, along with their respective running time analyses. Additionally, it provides numerical examples for computing the prefix function for given patterns.

Uploaded by

aniketpsingh2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Design & Analysis of Algorithms

(KCS-503)
Unit-5
String matching
Course Outline:-
⮚ Knuth-Morris-Pratt String Matching
The Knuth-Morris-Pratt algorithm
We now present a linear-time string-matching algorithm due to Knuth, Morris, and
Pratt. Their algorithm achieves a Θ (n + m) running time by avoiding the
computation of the transition function δ altogether, and it does the pattern matching
using just an auxiliary function π[1 . . m] precomputed from the pattern in time
O(m). The array π allows the transition function δ to be computed efficiently (in an
amortized sense) "on the fly" as needed. Roughly speaking, for any state q = 0,
1, . . . , m, and any character a ϵ Σ, the value π [q] contains the information that is
independent of a and is needed to compute δ (q, a). Since the array π has only m
entries, where as δ has O(m | Σ |) entries, we save a factor of Σ in the preprocessing
by computing π rather than δ.
Components of KMP Algorithm
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates
knowledge about how the pattern matches against the shift of itself. This
information can be used to avoid a useless shift of the pattern 'p.' In other words,
this enables avoiding backtracking of the string 'S.'

2.The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs,
find the occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which
occurrences are found.
The Prefix Function (Π)
COMPUTE- PREFIX- FUNCTION (P)
1. m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Running Time Analysis:
In the above pseudo code for calculating the prefix function, the for loop
from step 4 to step 10 runs 'm' times. Step1 to Step3 take constant time.
Hence the running time of computing prefix function is O (m).
Numericals

Q1: Compute the prefix function for the pattern


ababbabbababbababbabb when the alphabet is = {a, b}.

Q2: Compute the prefix function for the pattern ababaca when the
alphabet is = {a, b, c}.
The KMP Matcher
KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next match
Running Time Analysis

The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the
string 'S.' Since step 1 to step 4 take constant times, the running time is dominated
by this for the loop. Thus running time of the matching function is O (n).
Pattern 'P' has been found to complexity occur in a string 'T.' The
total number of shifts that took place for the match to be found is
i-m = 13 - 7 = 6 shifts.
The End

B N Pandey 7/5/2020

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy