0% found this document useful (0 votes)
33 views

String Searching Algorithm

This document discusses several string searching algorithms: Naive, Knuth-Morris-Pratt, Shift-OR, Boyer-Moore, Boyer-Moore-Horspool, and Karp-Rabin. It explains the basic ideas and provides examples for each algorithm.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

String Searching Algorithm

This document discusses several string searching algorithms: Naive, Knuth-Morris-Pratt, Shift-OR, Boyer-Moore, Boyer-Moore-Horspool, and Karp-Rabin. It explains the basic ideas and provides examples for each algorithm.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

String Searching Algorithm

 指導教授 : 黃三益 教授
 組員 : 9142639 蔡嘉文
9142642 高振元
9142635 丁康迪
String Searching Algorithm
 Outline:
 The Naive Algorithm
 The Knuth-Morris-Pratt Algorithm
 The SHIFT-OR Algorithm
 The Boyer-Moore Algorithm
 The Boyer-Moore-Horspool Algorithm
 The Karp-Rabin Algorithm
 Conclusion
String Searching Algorithm
 Preliminaries:
 n: the length of the text
 m: the length of the pattern(string)
 c: the size of the alphabet
 Cn: the expected number of comparisons
performed by an algorithm while searching
the pattern in a text of length n
The Naive Algorithm
Char text[], pat[] ;
int n, m ;
{
int i, j, k, lim ; lim=n-m+1 ;
for (i=1 ; i<=lim ; i++) /* search */
{
k=i ;
for (j=1 ; j<=m && text[k]==pat[j]; j++) k++;
if (j>m) Report_match_at_position(i-j+1);
}
}
The Naive Algorithm(cont.)
 The idea consists of trying to match any
substring of length m in the text with the
pattern.
The Knuth-Morris-Pratt Algorithm
{
int j, k ;
int next[Max_Pattern_Size];
initnext(pat, m+1, next); /*preprocess pattern, 建立
j=k=1 ; next table*/
do{ /*search*/
if (j==0 || text[k]==pat[j] ) k++; j++;
else j=next[j] ;
if (j>m) Report_match_at_position(k-m);
} while (k<=n)
}
The Knuth-Morris-Pratt
Algorithm(cont.)
 To accomplish this, the pattern is preprocessed
to obtain a table that gives the next position in
the pattern to be processed after a mismatch.
 Ex:
position: 1 2 3 4 5 6 7 8 9 10 11
pattern: a b r a c a d a b r a
Next[j]: 0 1 1 0 2 0 2 0 1 1 0
text: a b r a c a f ……………
The Shift-Or Algorithm
 The main idea is to represent the state of the
search as a number.
 State=S1 . 20 + S2 . 21+…+Sm . 2m-1
 Tx=δ(pat1=x) . 20 + δ(pat2=x) +…..+
δ(patm=x) . 2m-1
 For every symbol x of the alphabet,
whereδ(C) is 0 if the condition C is true, and
1 otherwise.
The Shift-Or Algorithm(cont.)
 Ex:{a,b,c,d} be the alphabet, and ababc the
pattern.
T[a]=11010,T[b]=10101,T[c]=01111,T[d]=11111
the initial state is 11111
The Shift-Or Algorithm(cont.)
 Pattern: ababc
 Text: a b d a b a b c

 T[x]:11010 10101 11111 11010 10101 11010 10101 01111


 State: 11110 11101 11111 11110 11101 11010 10101 01111
 For example, the state 10101 means that in the current
position we have two partial matches to the left, of
lengths two and four, respectively.
 The match at the end of the text is indicated by the
value 0 in the leftmost bit of the state of the search.
The Boyer-Moore Algorithm
 Search from right to left in the pattern
 Shift method :
 match heuristic
compute the dd table for the pattern
 occurrence heuristic
compute the d table for the pattern
The Boyer-Moore Algorithm
(cont.)
Match shift
The Boyer-Moore Algorithm
(cont.)
occurrence shift
The Boyer-Moore Algorithm
(cont.)
k=m
while(k<=n){
j=m;
while(j>0&&text[k]==pat[j])
{ j -- , k -- }
if(j == 0)
{ report_match_at_position(k+1) ; }
else k+= max( d[text[k] , dd[j]);
}
The Boyer-Moore Algorithm
(cont.)
 Example

T : xyxabraxyzabracadabra
P : abracadabra

mismatch, compute a shift


The Boyer-Moore-Horspool
Algorithm
 A simplification of BM Algorithm

 Compares the pattern from left to right


The Boyer-Moore-Horspool
Algorithm(cont.)
for(k=;k<=m;k++) d[pat[k] = m+1-k;
pat[m+1]=CHARACTER_NOT_IN_THE_TEXT;
lim = n-m+1;
for( k=1; k<=lim ; k+= d[text[k+m]] )
{
i=k;
for(j=1 ; text[i]==pat[j] ; j++) i++;
if( j==m+1) report_match_at_position(k);
}
The Boyer-Moore-Horspool
Algorithm(cont.)
 Eaxmple :

T:xyzabraxyzabracadabra
P:abracadabra
The Karp-Rabin Algorithm
 Use hashing
 Computing the signature function of
each possible m-character substring
 Check if it is equal to the signature
function of the pattern
 Signature function h(k)=k mod q, q is a
large prime
The Karp-Rabin
Algorithm(cont.)
rksearch( text, n, pat, m ) /* Search pat[1..m] in text[1..n] */
char text[], pat[]; /* (0 m = n) */
int n, m;
{
int h1, h2, dM, i, j;
dM = 1;
for( i=1; i<m; i++ ) dM = (dM << D) % Q; /* Compute the signature */
h1 = h2 = O; /* of the pattern and of */
for( i=1; i<=m; i++ ) /* the beginning of the */
{ /* text */
h1 = ((h1 << D) + pat[i] ) % Q;
h2 = ((h2 << D) + text[i] ) % Q;
}
The Karp-Rabin
Algorithm(cont.)
for( i = 1; i <= n-m+1; i++ ) /* Search */
{
if( h1 == h2 ) /* Potential match */
{
for(j=1; j<=m && text[i-1+j] == pat[j]; j++ ); /* check */
if( j > m ) /* true match */
Report_match_at_position( i );
}
h2 = (h2 + (Q << D) - text[i]*dM ) % Q; /* update the signature */
h2 = ((h2 << D) + text[i+m] ) % Q; /* of the text */
}
}
Conclusions
 Test: Random pattern, random text and English
text
 Best: The Boyer-Moore-Horspool Algorithm
 Drawback: preprocessing time and space(depend
on alphabet/pattern size)
 Small pattern: The Shift-Or Algorithm
 Large alphabet: The Knuth-Morris-Pratt Algorithm
 Others: The Boyer-Moore Algorithm
 “don’t care”: The Shift-Or Algorithm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy