0% found this document useful (0 votes)
22 views57 pages

SplitPDFFile 346 To 402

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views57 pages

SplitPDFFile 346 To 402

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

UNIT V - data structure notes r18 jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

UNIT - V

Pattern Matching and Tries: Pattern matching algorithms-Brute force, the Boyer –Moore
algorithm, the Knuth-Morris-Pratt algorithm, Standard Tries, Compressed Tries, Suffix tries.

Pattern Matching

Pattern searching is an important problem in computer science. When we do search for


a string in notepad/word file or browser or database, pattern searching algorithms are
used to show the search results.
A typical problem statement would be-
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[],
chartxt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
Input: txt[] = "THIS IS A TEST
TEXT"pat[] = "TEST"
Output: Pattern found at index 10
Input: txt[] =
"AABAACAADAABAABA"
pat[] = "AABA"
Output: Pattern found at
index 0Pattern found at
index 9 Pattern found at
index 12
Different Types of Pattern Matching Algorithms
1. Navie Based Algorithm or Brute Force Algorithm
2. Boyer Moore Algorithm
3. Knuth-Morris Pratt (KMP) Algorithm

Navie Based Algorithm or Brute Force Algorithm

When we talk about a string matching algorithm, every one can get a simple string
matching technique. That is starting from first letters of the text and first letter of the
pattern check whether these two letters are equal. if it is, then check second letters of the
text and pattern. If it is not equal, then move first letter of the pattern to the second
letter of the text. then check these two letters. this is the simple technique everyone can
thought.

Brute Force string matching algorithm is also like that. Therefore we call that as Naive
string

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

matching algorithm. Naive means basic.

Brute Force Algorithm

do
if (text letter == pattern letter)
compare next letter of pattern to next letter of text
else
move pattern down text by one letter
while (entire pattern found or end of text)

Lets learn this method using an example.


EXAMPLE 1

Let our text (T) as,


THIS IS A SIMPLE EXAMPLE
and our pattern (P) as,
SIMPLE

In above red boxes says mismatch letters against letters of the text and green boxes
saysmatch letters against letters of the text. According to the above

In first raw we check whether first letter of the pattern is matched with the first letter
of the text. It is mismatched, because "S" is the first letter of pattern and "T" is the first
letter of text. Then we move the pattern by one position. Shown in second raw.

Red Boxes-Mismatch Green Boxes-Match

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Then check first letter of the pattern with the second letter of text. It is also
mismatched. Likewise we continue the checking and moving process. In fourth raw we
can see first letter of the pattern matched with text. Then we do not do any moving but
we increase testing letter of the pattern. We only move the position of pattern by one
when we find mismatches. Also in last raw, we can see all the letters of the pattern
matched with the some letters of the text continuously.

Example 2

Running Time Analysis Of Brute Force String Matching Algorithm

Worst Case

Given a pattern M characters in length, and a text N characters in length...


• Worst case: compares pattern to each substring of text of length M.
For example, M=5.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

• Total number of comparisons: M (N-M+1) • Worst case time complexity: Ο(MN)

• Total number of comparisons: M (N-M+1)


• Worst case time complexity: Ο(MN)

Best case

Given a pattern M characters in length, and a text N characters in length...


• Best case if pattern found: Finds pattern in first M positions of text.
For example, M=5.
AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAA 5 comparisons made
• Total number of comparisons: M
• Best case time complexity: Ο(M)
Best case if pattern not found:

Always mismatch on first character. For example, M=5.

• Total number of comparisons: N


• Best case time complexity: Ο(N)

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Advantages

1. Very simple technique and also that does not require any preprocessing. Therefore
totalrunning time is the same as its matching time.

Disadvantages

1. Very inefficient method. Because this method takes only one position movement in
each time

Boyer Moore Algorithm for Pattern Searching

The B-M algorithm takes a backward approach . the pattern string(p) is aligned with the
start ofthe text string(T) and then compare the characters of pattern from right to left
beginning with rightmost character

If a character is compared that is not within the pattern, no match can be found by
comparingany furher characters at this position so the pattern can be shifted
completely past the mismatching character.

For determining the possible shifts , B-M algorithm uses 2 preprocessing strategies
simultaneously whenever a mismatch occurs, the algorithm computes a shift using
both strategies and selects the longer one. thus it makes use of the most efficient
stategy for eachindividual case

NOTE : Boyer Moore algorithm starts matching from the last character of the pattern.

The 2 strategies are called heuristics of B-M as they are used to reduce the search. They
are

1) Bad Character Heuristic


2) Good Suffix Heuristic

Bad Character Heuristic

The idea of bad character heuristic is simple. The character of the text which doesn’t
match with the current character of the pattern is called the Bad Character. Upon
mismatch, we shiftthe pattern until –
1) The mismatch becomes a match
2) Pattern P move past the mismatched character.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Case 1 – Mismatch become match


We will lookup the position of last occurrence of mismatching character in pattern and if
mismatching character exist in pattern then we’ll shift the pattern such that it get aligned
to themismatching character in text T.

case 1

Explanation: In the above example, we got a mismatch at position 3. Here our


mismatching character is “A”. Now we will search for last occurrence of “A” in pattern.
We got “A” at position 1 in pattern (displayed in Blue) and this is the last occurrence of
it. Now we will shift pattern 2 times so that “A” in pattern get aligned with “A” in text.

Case 2 – Pattern move past the mismatch character

We’ll lookup the position of last occurrence of mismatching character in pattern and if
character does not exist we will shift pattern past the mismatching character.

case2

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Explanation: Here we have a mismatch at position 7. The mismatching character “C”


does not exist in pattern before position 7 so we’ll shift pattern past to the position 7 and
eventually in above example we have got a perfect match of pattern (displayed in
Green). We are doing this because, “C” do not exist in pattern so at every shift before
position 7 we will get mismatch and our search will be fruitless.

Problem in Bad Character Heuristic

In some cases Bad Character Heuristic produces negative


resultsFor Example:

This means we need some extra information to produce a shift an encountering a bad
character. The information is about last position of evry character in the pattern and
also the set of every character in the pattern and also the set of characters used in the
pattern

2.Good Suffix Heuristic


Let t be substring of text T which is matched with substring of pattern P. Now we shift
patternuntil :
1) Another occurrence of t in P matched with t in T.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

2) A prefix of P, which matches with suffix of t


3) P moves past t

Case 1: Another occurrence of t in P matched with t in T

Pattern P might contain few more occurrences of t. In such case, we will try to shift the
patternto align that occurrence with t in text T. For example-

Explanation: In the above example, we have got a substring t of text T matched with
pattern P (in green) before mismatch at index 2. Now we will search for occurrence of t
(“AB”) in P. We have found an occurrence starting at position 1 (in yellow background)
so we will right shift the pattern 2 times to align t in P with t in T. This is weak rule of
original Boyer Moore

Case 2: A prefix of P, which matches with suffix of t in T

It is not always likely that we will find the occurrence of t in P. Sometimes there is
no occurrence at all, in such cases sometimes we can search for some suffix of t
matching withsome prefix of P and try to align them by shifting P. For example –

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Explanation: In above example, we have got t (“BAB”) matched with P (in green) at
index 2-4 before mismatch . But because there exists no occurrence of t in P we will
search for some prefix of P which matches with some suffix of t. We have found prefix
“AB” (in the yellow background) starting at index 0 which matches not with whole t but
the suffix of t “AB” starting at index 3. So now we will shift pattern 3 times to align prefix
with the suffix.

Case 3: P moves past t

If the above two cases are not satisfied, we will shift the pattern past the t. For example –

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Explanation: If above example, there exist no occurrence of t (“AB”) in P and also


there is no prefix in P which matches with the suffix of t. So, in that case, we can never
find any perfect match before index 4, so we will shift the P past the t ie. to index 5.

Strong Good suffix Heuristic

Suppose substring q = P[i to n] got matched with t in T and c = P[i-1] is the


mismatching character. Now unlike case 1 we will search for t in P which is not preceded
by character c. The closest such occurrence is then aligned with t in T by shifting pattern
P. For example –

Explanation: In above example, q = P[7 to 8] got matched with t in T. The


mismatching character c is “C” at position P[6]. Now if we start searching t in P we will
get the first occurrence of t starting at position 4. But this occurrence is preceded by “C”
which is equal to c, so we will skip this and carry on searching. At position 1 we got
another occurrence of t (in the yellow background). This occurrence is preceded by “A”
(in blue) which is not equivalent to c. So we will shift pattern P 6 times to align this
occurrence with t in T.We are doing this because we already know that character c =
“C” causes the mismatch. So any occurrence of t preceded by c will again cause
mismatch when aligned with t, so that’s why it is better to skip this.

Preprocessing for Good suffix heuristic

As a part of preprocessing, an array shift is created. Each entry shift[i] contain the
distance pattern will shift if mismatch occur at position i-1. That is, the suffix of pattern
starting at position i is matched and a mismatch occur at position i-1. Preprocessing is
done separately for strong good suffix and case 2 discussed above.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

1) Preprocessing for Strong Good Suffix


Before discussing preprocessing, let us first discuss the idea of border. A border is a
substringwhich is both proper suffix and proper prefix. For example, in string
“ccacc”, “c” is a
border, “cc” is a border because it appears in both end of string but “cca” is not a border.

As a part of preprocessing an array bpos (border position) is calculated. Each


entry bpos[i] contains the starting index of border for suffix starting at index i in given
pattern P.
The suffix φ beginning at position m has no border, so bpos[m] is set to m+1
where m is thelength of the pattern.
The shift position is obtained by the borders which cannot be extended to the left.

Complexity of Boyer Moore Algorithm

This algorithm takes o(mn) in the worst case and O(nlog(m)/m) on average case,
which is the sub linear in the sense that not all characters are inspected

Applications

This algorithm is highly useful in tasks like recursively searching files for virus
patterns,searching databases for keys or data ,text and word processing and any other
task that requires handling large amount of data at very high speed

Knuth-Morris Pratt (KMP) Algorithm for Pattern Searching

The Naive pattern searching algorithm doesn’t work well in cases where we see many
matching
characters followed by a mismatching character. Following are some

examples. txt[] = "AAAAAAAAAAAAAAAAAB"

pat[] = "AAAAB"

txt[] = "ABABABCABABABCABABABC"

pat[] = "ABABAC" (not a worst case, but a bad case for Naive

KMP Algorithm is one of the most popular patterns matching algorithms. KMP stands
for Knuth Morris Pratt. KMP algorithm was invented by Donald Knuth and Vaughan
Pratt together and independently by James H Morris in the year 1970. In the year 1977,
all the three jointlypublished KMP Algorithm.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

KMP algorithm was the first linear time complexity algorithm for string matching.
KMP algorithm is one of the string matching algorithms used to find a Pattern in a Text.

KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares


character by character from left to right. But whenever a mismatch occurs, it uses a
preprocessed table called "Prefix Table" to skip characters comparison while
matching. Some times prefix table is also known as LPS Table. Here LPS stands for
"Longest proper Prefix which is also Suffix".

Steps for Creating LPS Table (Prefix Table)

• Step 1 - Define a one dimensional array with the size equal to the length of the Pattern.
(LPS[size])
• Step 2 - Define variables i & j. Set i = 0, j = 1 and LPS[0] = 0.
• Step 3 - Compare the characters at Pattern[i] and Pattern[j].
• Step 4 - If both are matched then set LPS[j] = i+1 and increment both i & j values by one.
Goto to Step 3.
• Step 5 - If both are not matched then check the value of variable 'i'. If it is '0' then
set LPS[j] = 0 and increment 'j' value by one, if it is not '0' then set i = LPS[i-1]. Goto Step
3.
• Step 6- Repeat above steps until all the values of LPS[] are filled.
Let us use above steps to create prefix table for a pattern...

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

How to use LPS Table

We use the LPS table to decide how many characters are to be skipped for
comparisonwhen a mismatch has occurred.
When a mismatch occurs, check the LPS value of the previous character of the
mismatched

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

character in the pattern. If it is '0' then start comparing the first character of the
pattern with the next character to the mismatched character in the text. If it is not '0'
then start comparing the character which is at an index value equal to the LPS value of
the previous character to themismatched character in pattern with the mismatched
character in the Text.

How the KMP Algorithm Works

Let us see a working example of KMP Algorithm to find a Pattern in a Text

EXAMPLE 1

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Example 2

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

KMP ALGORITHM COMPLEXITY

O(m)- it is to compute to prefix function


valuesO(n)-it is to compare the pattern to
the text O(n+m)- Total time taken by KMP
Algorithm.
Advantages

• The running time of KMP algorithm is O(n+m). which is very fast


• The algorithm never needs to move backwards in the input text T. It makes the
algorithm good for processing very large files.
Disadvantages

• Does not work well as the size of the alphabet increase. By which more chances of
mismatch occurs

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

TRIES DATA STRUCTURE

Trie is an efficient information reTrieval data structure. The term tries comes from
the wordretrieval

Definition of a Trie

• Data structure for representing a collection of strings


• In computer science , a trie also called digital tree or radix tree or prefix tree.
• Tries support fast string matching.

Properties of Tries

• A Multi way tree


• Each node has from 1 to n children
• Each edge of the tree is labeled with a character
• Each leaf node corresponds to the stored string which is a concatenation of characters
on a path from the root to this node.

EXAMPLE

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Trie | (Insert and Search)

Trie is an efficient information retrieval data structure. Using Trie, search complexities
can bebrought to an optimal limit (key length).
Given multiple strings. The task is to insert the string in a Trie

Examples:

Example 1: str = {"cat", "there", "caller", "their", "calling", “bat”}

root

/ \

c t

| |

a h

|\ |

l t e

| | \

l i r

|\ | |

e i r e

| |

r n

Example 2: str = {"Candy", "cat", "Caller", "calling"}

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

root

/ |\

l n t

| |

l d

|\|

eiy

| |

r n

Approach: An efficient approach is to treat every character of the input key as an


individual trie node and insert it into the trie. Note that the children are an array of
pointers (or references) to next level trie nodes. The key character acts as an index into
the array of children. If the input key is new or an extension of the existing key, we need
to construct non-existing nodes of the key, and mark end of the word for the last node. If
the input key is a prefix of the existing key in Trie, we simply mark the last node of the
key as the end of a word. The key length determines Trie depth.

Trie deletion

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Here is an algorithm how to delete a node from trie.


During delete operation we delete the key in bottom up manner using recursion. The
following are possible conditions when deleting key from trie,

1. Key may not be there in trie. Delete operation should not modify trie.

2. Key present as unique key (no part of key contains another key (prefix), nor the key
itself is prefix of another key in trie). Delete all the nodes.

3. Key is prefix key of another long key in trie. Unmark the leaf node.

4. Key present in trie, having atleast one other key as prefix key. Delete nodes from end of
key until first leaf node of longest prefix key.

Time Complexity: The time complexity of the deletion operation is O(n) where n is
the keylength

Advantages of Trie Data Structure

Tries is a tree that stores strings. The maximum number of children of a node is
equal to the size of the alphabet. Trie supports search, insert and delete operations
in O(L) time where L is the length of the key.

Hashing:- In hashing, we convert the key to a small value and the value is used to
index data. Hashing supports search, insert and delete operations in O(L) time on
average.

Self Balancing BST : The time complexity of the search, insert and delete
operations in a self-balancing Binary Search Tree (BST) (like Red-Black Tree, AVL
Tree, Splay Tree, etc) is O(L
* Log n) where n is total number words and L is the length of the word. The
advantage of Self-balancing BSTs is that they maintain order which makes
operations like minimum, maximum, closest (floor or ceiling) and kth largest faster.

Why Trie? :-

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

1. With Trie, we can insert and find strings in O(L) time where L represent the length of a
single word. This is obviously faster than BST. This is also faster than Hashing because of
the ways it is implemented. We do not need to compute any hash function. No collision
handling is required (like we do in open addressing and separate chaining)

2. Another advantage of Trie is, we can easily print all words in alphabetical order which is
not easily possible with hashing.

3. We can efficiently do prefix search (or auto-complete) with Trie.

Issues with Trie :-


The main disadvantage of tries is that they need a lot of memory for storing the
strings. For each node we have too many node pointers(equal to number of
characters of the alphabet),if space is concerned, then Ternary Search Tree can be
preferred for dictionary implementations. In Ternary Search Tree, the time
complexity of search operation is O(h) where h is the height of the tree. Ternary
Search Trees also supports other operations supported by Trie like prefix search,
alphabetical order printing, and nearest neighbor search.
The final conclusion is regarding tries data structure is that they are faster but require huge
memory for storing the strings.

APPLICATIONS OF TRIES

String handling and processing are one of the most important topics for
programmers.Many real time applications are based on the string processing like:

1. Search Engine results optimization


2. Data Analytics
3. Sentimental Analysis

The data structure that is very important for string handling is the Trie data structure
that isbased on prefix of string

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

TYPES OF TRIES

Tries are classified into three categories:

1. Standard Tries
2. Compressed Tries
3. Suffix Tries

STANDARD TRIES

A standard trie have the following properties:}


• It is an ordered tree like data structure.
• Each node(except the root node) in a standard trie is labeled with a character.
• The children of a node are in alphabetical order.
• Each node or branch represents a possible character of keys or words.
• Each node or branch may have multiple branches.
• The last node of every key or word is used to mark the end of word or node.
• The path from external node to the root yields the string of S.
Below is the illustration of the Standard Trie

Standard Trie Insertion

Strings={ a,an,and,any}

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Example of Standard Trie

Standard trie for the following strings


S={ bear, bell, bid, bull, buy, sell, stock, stop}

Handling Keys(strings)

• When a key is prefix of another key


How can we know that “an “ is a word
Example : an, and

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Standard Trie Searching

Search hit where search node has a $ symbol

Standard Trie Deletion

To perform the deletion there exist cases

1. Word not found


Return false
2. Word exist as a standalone word
I. Part of any other node

Example:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

II. Does not part of any other node

EXAMPLE

3. Word exist as a prefix of another word.

COMPRESSED TRIE

A Compressed trie have the following properties:

1. A Compressed Trie is an advanced version of the standard trie.

2. Each nodes(except the leaf nodes) have atleast 2 children.

3. It is used to achieve space optimization.

4. To derive a Compressed Trie from a Standard Trie, compression of chains of redundant


nodes is performed.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

5. It consists of grouping, re-grouping and un-grouping of keys of characters.

6. While performing the insertion operation, it may be required to un-group the already
grouped characters.

7. While performing the deletion operation, it may be required to re-group the already
grouped characters.

Compressed trie is constructed from standard trie

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Storage of Compressed Trie

A compressed Trie can be stored at O9s) where s= | S| by using O(1) Space index ranges
at thenodes

In the below representation each node is represented with (I,j,k)


valueI indicate index of the string
j—starting index of the character of string I
k -- ending index of the character of the string I
Ex: In the given diagram node (4,2,3) having the characters(ll) which belongs to s[4]
so i=4,index of l character in s[4] is 2 so j=2 and ending index is 3 so k=3

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

SUFFIX TRIES

A Suffix trie have the following properties:

1. Suffix trie is a compressed trie for all the suffixes of the text
2. Suffix trie are space efficient data structure to store a string that allows many kinds of
queries to be answered quickly.

Example

Let us consider an example text “soon$”

After alphabetically order the trie look like

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Advantages of suffix tries

1. Insertion is faster compared to the hash table


2. Look up is faster than hash table implementation
3. There are no collision of different keys in tries

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

UNIT - V

Pattern Matching and Tries: Pattern matching algorithms-Brute force, the Boyer –Moore
algorithm, the Knuth-Morris-Pratt algorithm, Standard Tries, Compressed Tries, Suffix tries.

Pattern Matching

Pattern searching is an important problem in computer science. When we do search for


a string in notepad/word file or browser or database, pattern searching algorithms are
used to show the search results.
A typical problem statement would be-
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[],
chartxt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.
Examples:
Input: txt[] = "THIS IS A TEST
TEXT"pat[] = "TEST"
Output: Pattern found at index 10
Input: txt[] =
"AABAACAADAABAABA"
pat[] = "AABA"
Output: Pattern found at
index 0Pattern found at
index 9 Pattern found at
index 12
Different Types of Pattern Matching Algorithms
1. Navie Based Algorithm or Brute Force Algorithm
2. Boyer Moore Algorithm
3. Knuth-Morris Pratt (KMP) Algorithm

Navie Based Algorithm or Brute Force Algorithm

When we talk about a string matching algorithm, every one can get a simple string
matching technique. That is starting from first letters of the text and first letter of the
pattern check whether these two letters are equal. if it is, then check second letters of the
text and pattern. If it is not equal, then move first letter of the pattern to the second
letter of the text. then check these two letters. this is the simple technique everyone can
thought.

Brute Force string matching algorithm is also like that. Therefore we call that as Naive
string
www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh
lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

matching algorithm. Naive means basic.

Brute Force Algorithm

do
if (text letter == pattern letter)
compare next letter of pattern to next letter of text
else
move pattern down text by one letter
while (entire pattern found or end of text)

Lets learn this method using an example.


EXAMPLE 1

Let our text (T) as,


THIS IS A SIMPLE EXAMPLE
and our pattern (P) as,
SIMPLE

In above red boxes says mismatch letters against letters of the text and green boxes
saysmatch letters against letters of the text. According to the above

In first raw we check whether first letter of the pattern is matched with the first letter
of the text. It is mismatched, because "S" is the first letter of pattern and "T" is the first
letter of text. Then we move the pattern by one position. Shown in second raw.

Red Boxes-Mismatch Green Boxes-Match

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Then check first letter of the pattern with the second letter of text. It is also
mismatched. Likewise we continue the checking and moving process. In fourth raw we
can see first letter of the pattern matched with text. Then we do not do any moving but
we increase testing letter of the pattern. We only move the position of pattern by one
when we find mismatches. Also in last raw, we can see all the letters of the pattern
matched with the some letters of the text continuously.

Example 2

Running Time Analysis Of Brute Force String Matching Algorithm

Worst Case

Given a pattern M characters in length, and a text N characters in length...


• Worst case: compares pattern to each substring of text of length M.
For example, M=5.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

• Total number of comparisons: M (N-M+1) • Worst case time complexity: Ο(MN)

• Total number of comparisons: M (N-M+1)


• Worst case time complexity: Ο(MN)

Best case

Given a pattern M characters in length, and a text N characters in length...


• Best case if pattern found: Finds pattern in first M positions of text.
For example, M=5.
AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAA 5 comparisons made
• Total number of comparisons: M
• Best case time complexity: Ο(M)
Best case if pattern not found:

Always mismatch on first character. For example, M=5.

• Total number of comparisons: N


• Best case time complexity: Ο(N)

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Advantages

1. Very simple technique and also that does not require any preprocessing. Therefore
totalrunning time is the same as its matching time.

Disadvantages

1. Very inefficient method. Because this method takes only one position movement in
each time

Boyer Moore Algorithm for Pattern Searching

The B-M algorithm takes a backward approach . the pattern string(p) is aligned with the
start ofthe text string(T) and then compare the characters of pattern from right to left
beginning with rightmost character

If a character is compared that is not within the pattern, no match can be found by
comparingany furher characters at this position so the pattern can be shifted
completely past the mismatching character.

For determining the possible shifts , B-M algorithm uses 2 preprocessing strategies
simultaneously whenever a mismatch occurs, the algorithm computes a shift using
both strategies and selects the longer one. thus it makes use of the most efficient
stategy for eachindividual case

NOTE : Boyer Moore algorithm starts matching from the last character of the pattern.

The 2 strategies are called heuristics of B-M as they are used to reduce the search. They
are

1) Bad Character Heuristic


2) Good Suffix Heuristic

Bad Character Heuristic

The idea of bad character heuristic is simple. The character of the text which doesn’t
match with the current character of the pattern is called the Bad Character. Upon
mismatch, we shiftthe pattern until –
1) The mismatch becomes a match
2) Pattern P move past the mismatched character.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Case 1 – Mismatch become match


We will lookup the position of last occurrence of mismatching character in pattern and if
mismatching character exist in pattern then we’ll shift the pattern such that it get aligned
to themismatching character in text T.

case 1

Explanation: In the above example, we got a mismatch at position 3. Here our


mismatching character is “A”. Now we will search for last occurrence of “A” in pattern.
We got “A” at position 1 in pattern (displayed in Blue) and this is the last occurrence of
it. Now we will shift pattern 2 times so that “A” in pattern get aligned with “A” in text.

Case 2 – Pattern move past the mismatch character

We’ll lookup the position of last occurrence of mismatching character in pattern and if
character does not exist we will shift pattern past the mismatching character.

case2

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Explanation: Here we have a mismatch at position 7. The mismatching character “C”


does not exist in pattern before position 7 so we’ll shift pattern past to the position 7 and
eventually in above example we have got a perfect match of pattern (displayed in
Green). We are doing this because, “C” do not exist in pattern so at every shift before
position 7 we will get mismatch and our search will be fruitless.

Problem in Bad Character Heuristic

In some cases Bad Character Heuristic produces negative


resultsFor Example:

This means we need some extra information to produce a shift an encountering a bad
character. The information is about last position of evry character in the pattern and
also the set of every character in the pattern and also the set of characters used in the
pattern

2.Good Suffix Heuristic


Let t be substring of text T which is matched with substring of pattern P. Now we shift
patternuntil :
1) Another occurrence of t in P matched with t in T.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

2) A prefix of P, which matches with suffix of t


3) P moves past t

Case 1: Another occurrence of t in P matched with t in T

Pattern P might contain few more occurrences of t. In such case, we will try to shift the
patternto align that occurrence with t in text T. For example-

Explanation: In the above example, we have got a substring t of text T matched with
pattern P (in green) before mismatch at index 2. Now we will search for occurrence of t
(“AB”) in P. We have found an occurrence starting at position 1 (in yellow background)
so we will right shift the pattern 2 times to align t in P with t in T. This is weak rule of
original Boyer Moore

Case 2: A prefix of P, which matches with suffix of t in T

It is not always likely that we will find the occurrence of t in P. Sometimes there is
no occurrence at all, in such cases sometimes we can search for some suffix of t
matching withsome prefix of P and try to align them by shifting P. For example –

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Explanation: In above example, we have got t (“BAB”) matched with P (in green) at
index 2-4 before mismatch . But because there exists no occurrence of t in P we will
search for some prefix of P which matches with some suffix of t. We have found prefix
“AB” (in the yellow background) starting at index 0 which matches not with whole t but
the suffix of t “AB” starting at index 3. So now we will shift pattern 3 times to align prefix
with the suffix.

Case 3: P moves past t

If the above two cases are not satisfied, we will shift the pattern past the t. For example –

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Explanation: If above example, there exist no occurrence of t (“AB”) in P and also


there is no prefix in P which matches with the suffix of t. So, in that case, we can never
find any perfect match before index 4, so we will shift the P past the t ie. to index 5.

Strong Good suffix Heuristic

Suppose substring q = P[i to n] got matched with t in T and c = P[i-1] is the


mismatching character. Now unlike case 1 we will search for t in P which is not preceded
by character c. The closest such occurrence is then aligned with t in T by shifting pattern
P. For example –

Explanation: In above example, q = P[7 to 8] got matched with t in T. The


mismatching character c is “C” at position P[6]. Now if we start searching t in P we will
get the first occurrence of t starting at position 4. But this occurrence is preceded by “C”
which is equal to c, so we will skip this and carry on searching. At position 1 we got
another occurrence of t (in the yellow background). This occurrence is preceded by “A”
(in blue) which is not equivalent to c. So we will shift pattern P 6 times to align this
occurrence with t in T.We are doing this because we already know that character c =
“C” causes the mismatch. So any occurrence of t preceded by c will again cause
mismatch when aligned with t, so that’s why it is better to skip this.

Preprocessing for Good suffix heuristic

As a part of preprocessing, an array shift is created. Each entry shift[i] contain the
distance pattern will shift if mismatch occur at position i-1. That is, the suffix of pattern
starting at position i is matched and a mismatch occur at position i-1. Preprocessing is
done separately for strong good suffix and case 2 discussed above.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

1) Preprocessing for Strong Good Suffix

Before discussing preprocessing, let us first discuss the idea of border. A border is a
substringwhich is both proper suffix and proper prefix. For example, in string
“ccacc”, “c” is a
border, “cc” is a border because it appears in both end of string but “cca” is not a border.

As a part of preprocessing an array bpos (border position) is calculated. Each


entry bpos[i] contains the starting index of border for suffix starting at index i in given
pattern P.
The suffix φ beginning at position m has no border, so bpos[m] is set to m+1
where m is thelength of the pattern.
The shift position is obtained by the borders which cannot be extended to the left.

Complexity of Boyer Moore Algorithm

This algorithm takes o(mn) in the worst case and O(nlog(m)/m) on average case,
which is the sub linear in the sense that not all characters are inspected

Applications

This algorithm is highly useful in tasks like recursively searching files for virus
patterns,searching databases for keys or data ,text and word processing and any other
task that requires handling large amount of data at very high speed

Knuth-Morris Pratt (KMP) Algorithm for Pattern Searching

The Naive pattern searching algorithm doesn’t work well in cases where we see many
matching
characters followed by a mismatching character. Following are some

examples. txt[] = "AAAAAAAAAAAAAAAAAB"

pat[] = "AAAAB"

txt[] = "ABABABCABABABCABABABC"

pat[] = "ABABAC" (not a worst case, but a bad case for Naive

KMP Algorithm is one of the most popular patterns matching algorithms. KMP stands
for Knuth Morris Pratt. KMP algorithm was invented by Donald Knuth and Vaughan
Pratt together and independently by James H Morris in the year 1970. In the year 1977,
all the three jointlypublished KMP Algorithm.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

KMP algorithm was the first linear time complexity algorithm for string matching.
KMP algorithm is one of the string matching algorithms used to find a Pattern in a Text.

KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares


character by character from left to right. But whenever a mismatch occurs, it uses a
preprocessed table called "Prefix Table" to skip characters comparison while
matching. Some times prefix table is also known as LPS Table. Here LPS stands for
"Longest proper Prefix which is also Suffix".

Steps for Creating LPS Table (Prefix Table)

• Step 1 - Define a one dimensional array with the size equal to the length of the Pattern.
(LPS[size])
• Step 2 - Define variables i & j. Set i = 0, j = 1 and LPS[0] = 0.
• Step 3 - Compare the characters at Pattern[i] and Pattern[j].
• Step 4 - If both are matched then set LPS[j] = i+1 and increment both i & j values by one.
Goto to Step 3.
• Step 5 - If both are not matched then check the value of variable 'i'. If it is '0' then
set LPS[j] = 0 and increment 'j' value by one, if it is not '0' then set i = LPS[i-1]. Goto Step
3.
• Step 6- Repeat above steps until all the values of LPS[] are filled.
Let us use above steps to create prefix table for a pattern...

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

How to use LPS Table

We use the LPS table to decide how many characters are to be skipped for
comparisonwhen a mismatch has occurred.
When a mismatch occurs, check the LPS value of the previous character of the
mismatched

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

character in the pattern. If it is '0' then start comparing the first character of the
pattern with the next character to the mismatched character in the text. If it is not '0'
then start comparing the character which is at an index value equal to the LPS value of
the previous character to themismatched character in pattern with the mismatched
character in the Text.

How the KMP Algorithm Works

Let us see a working example of KMP Algorithm to find a Pattern in a Text

EXAMPLE 1

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


lOMoARc PSD|18 878400

www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Example 2

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


55 www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

KMP ALGORITHM COMPLEXITY

O(m)- it is to compute to prefix function values


O(n)-it is to compare the pattern to the text
O(n+m)- Total time taken by KMP Algorithm.
Advantages

• The running time of KMP algorithm is O(n+m). which is very fast


• The algorithm never needs to move backwards in the input text T. It makes thealgorithm good
for processing very large files.
Disadvantages

• Does not work well as the size of the alphabet increase. By which more chances of
mismatch occurs

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

B) ASSIGNMENT QUESTIONS

What is a Data Structure?


What are linear and non-linear data Structures?
What is Stack and where it can be used?
What is a Queue, how it is different from the stack and how is it implemented?
What are Infix, prefix, Postfix notations?
What is a Linked List and What are its types?
Which data structures are used for BFS and DFS of a graph?
How to implement a stack using queue?
How to implement a queue using a stack?
Which Data Structure Should be used for implementing LRU cache?
How to check if a given Binary Tree is BST or not?

C) Short Long Answer Question with Blooms TaxonomyLevels


Long answers
What is Data Structure: Types, Classifications and Applications
Introduction to Linear Data Structures
Data Structure Alignment : How data is arranged and accessed in Computer
Memory?
Static Data Structure vs Dynamic Data Structure
Short Answers

What is Data Structure? Explain.

Describe the types of Data Structures?

List the area of applications of Data Structure.

What is the difference between file structure and storage structure?

List the data structures which are used in RDBMS, Network Data Modal,
and Hierarchical Data Model.

Which data structure is used to perform recursion?

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

What is a Stack?

List the area of applications where stack data structure can be used?

List the area of applications where stack data structure can be used?
d) Objectives
1. Catalogingintheprocessofcreating databaseoftheLibraryresources.
2. Aprogrammeisasetof whicharemadetoperforma well-designedtask.
3. Theopensourcesoftwareisasoftwareforwhich isopen.
4. Localareanetwork(LAN)ConnectsComputersand devicesspreadinonarea of .
5. The switching technique provides path for data movement on network from
todestination.
6. Namethecharacteristic onthebasisofwhichE-resourcescanbecategorized.
a) Content andAccessibility
b) Print andAnalogue
c) Onlineand print
d) Accessibilityanddispatch
7. WhatdoesPDFStandsfor?
a) Printabledefinedformat
b) Portabledocumentformat
c) Printabledocumentfile
d) Principaldocumentformat
8. Findingout alltherelevantitemsonthestatedtopicisknownas
a) Highprecisionsearch
b) Highrecallsearch
c) Briefsearch
d) None oftheabove
1. Catalogingintheprocessofcreating databaseoftheLibraryresources.
2. Aprogrammeisasetof whicharemadetoperforma well-designedtask.
3. Theopensourcesoftwareisasoftwareforwhich isopen.
4. Localareanetwork(LAN)ConnectsComputersand devicesspreadinonarea of .
5. The switching technique provides path for data movement on network from
todestination.
6. Namethecharacteristic onthebasisofwhichE-resourcescanbecategorized.
e) Content andAccessibility
f) Print andAnalogue
g) Onlineand print
h) Accessibilityanddispatch

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

7. WhatdoesPDFStandsfor?
e) Printabledefinedformat
f) Portabledocumentformat
g) Printabledocumentfile
h) Principaldocumentformat
8. Findingout alltherelevantitemsonthestatedtopicisknownas
e) Highprecisionsearch
f) Highrecallsearch
g) Briefsearch
h) None oftheabove

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy