0% found this document useful (0 votes)
28 views8 pages

Evaluating Efficiency of Some Exact Stri

Uploaded by

blessm7740
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views8 pages

Evaluating Efficiency of Some Exact Stri

Uploaded by

blessm7740
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Review Article

iMedPub Journals 2021


American Journal of Computer
http://www.imedpub.com Science and Information Technology Vol. 9 No. 9: 112

Evaluating Efficiency of Some Exact String- Osamh Alrouwab1*, Dheba


Mansour2 and Mahmoud
Matching Algorithms on Large-Scale Genome Gargotti3
1
Department of Biochemistry, Faculty of
Medicine, University of Zawia, Zawia,Libya
Abstract 2
Department of Zoology, Faculty of Sciences,
Aljafra University, Almamura, Libya
Exact string-matching algorithms have become very supreme in many 3
Faculty of Medicine, Department of
bioinformatics tools. Despite the abundance and diversity of such algorithms, Microbiology, University of Zawia, Zawia,
exposing them to real-time experimental analysis has been critical. This study was Libya
conducted to evaluate the efficiency of ten exact-string matching algorithms on
large-scale genomic sequences from a runtime perspective. To define the most
efficient algorithms are qualified to handle the short alphabet used for nucleic Corresponding author:
Osamh Alrouwab, Department of
acid coding. Biochemistry, Faculty of Medicine,
The methodology promoted for this study was the factorial experiment with University of Zawia, Zawia, Libya
Randomized Complete Block Design (FRCBD). Under influence of four independent
parameters, four levels of pattern lengths, four levels of pattern indices, two levels
of programming languages, and ten levels of algorithmic architecture. The yield  usamaerawab@gmail.com
of the tested algorithms was calculated in nanoseconds. One-way ANOVA and
Two-way ANOVA tests with post-hoc Games-Howell test were used separately for
statistical analysis. In this study two widely accepted programming languages, C# Citation: Alrouwab O (2021) Evaluating
and JAVA were used to speculate the possible effect of programing language on Efficiency of Some Exact String-Matching
algorithm performance. Algorithms on Large-Scale Genome. Am J
Compt Sci Inform Technol Vol.9 No.9: 112.
The One-way ANOVA results revealed that the Backward-Oracle-Matching (BOM),
Zhu-Takaoka (ZT), and Horspool's (HP) algorithms exhibited the highest final
performance correspondingly. These algorithms have demonstrated an efficiency
of up to 250% higher than other algorithms. The results of two-way ANOVA
revealed a significant interaction between programing language adopted and
execution time with the absence of pattern lengths and pattern index effect. The
combination of the C# programing language and the Backward-Oracle-Matching
algorithm produced the most effective performance on genomic sequences.
Keywords: Exact-string matching algorithm; Factorial design; One-way ANOVA;
Two-way ANOVA; Games-Howell test

Received: September 24, 2021; Accepted: October 08, 2021; Published: October 15, 2021

Introduction equate the pattern characters with the characters in the target
text. This implementation was offered by the so-called Naïve
Admittedly, string-matching is an essential problem-solving or Brute-Force algorithm [5]. Text or pattern has not been pre-
technique, encountered by specialists from various disciplines e.g. processed by this algorithm. Its time complexity in the worst
Data mining, artificial intelligence, and Bioinformatics [1]. Oodles case is O (nm), where m and n apply to pattern and text length
of algorithms and methods have been announced for pattern correspondingly. Subsequently, numerous algorithms have made
recognition, and there are abundant applications and online formidable enhancements on Brute-Force time scheming. The
servers that can achieve precise string matching on biological worst-case, lower bound of the string-matching problem is O (n).
data [2]. The methodologies that endorse the recognition of The first algorithm to reach the bound was given by Morris and
the patterns contrast greatly, owing to the obvious variations in Pratt in the early [6] later improved by Knuth. Linear algorithms
algorithmic architecture [3]. Generally, string matching algorithms that are based on bit-parallelism were announced by Baeza-Yates
could be broadly classified into five distinct classes: (a) algorithms and Manber Xian-Feng et al. presented the KMPBS algorithm, a
used to resolve the problem by character comparisons, (b) hybrids algorithm based on Boyer–Moore (BM) and The Knuth-
algorithms that depend on the use of automatic probabilistic Morris-Pratt (KMP) algorithm. The text T is scanned from left to
simulation, (c) non-probabilistic simulation algorithms, (d) right for the given pattern P of length m. When searching, the very
constant-space algorithms, and (e) real-time algorithms [4]. last character of P is compared to the corresponding character
The more traditional and the humblest match approach are to
© Under License of Creative Commons Attribution 3.0 License | This article is available in: http://colorectal-cancer.imedpub.com/archive.php 1
2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

of text T, and the KMP algorithm is then used to compare the


remainder of the characters if there is a match [7-29]. Cao et al.
Materials and Methods
[7] formulated a character-based string matching algorithm that System participants
computes the statistical likelihood of each English letter in the
pattern string based on its unique position in the pattern string. To The dataset: The data on Clostridium botulinum strain
calculate the mathematical likelihood and dynamic condition of DFPST0029 chromosome (accession ID: NZ_CP028842), has been
each character in the pattern string, the suggested methodology retrieved from online, publicly accessible databases, the Entrez
utilizes optimization based on a high decision. Hakak et al. Gene databases from The National Center for Biotechnology
announced a novel exact-string matching methodology published Information, during January 29, 2021. The FASTA sequence file
in a research paper entitled "A new split based searching for was equipped with 3858511 DNA base-pair and used as target
exact pattern matching for natural texts"[8-10]. In this technique, text [15].
the assigned pattern is split into two chunks. To enhance the Search patterns: Sixteen randomized pattern groups were
search strategy, only the second chunk of the pattern is searched configured to represent the contrast in position and length
using a brute-force methodology against the given text. When levels to simulate realistic algorithm operating conditions (Table
the second chunk of the pattern is detected, the first chunk 1). Firstly, to investigate the impact of sequence length on the
of the pattern is directly mapped based on the location of the profitability of the algorithm, the length was subsequently
second chunk. The number of biology texts-based info collected extended four times. The mean of the pattern length was 3500
these days is generally increased at a pace rising [11]. Hence, (SD=3341.66). Lastly, to gauge the influence of the pattern site on
the answer to the question, what is the algorithm that can be algorithm run time, the pattern length was set to four positions
relied upon from the perspective of reliability and productivity that represented the topographic regions of the target text. The
among this tremendous momentum of available algorithms mean of the pattern position was 921542.60 (SD=994136.98)
become an imperative necessity. The methodology espoused [16].
for this experiment was the so-called Factorial experiment with
Table 1: Pattern sets.
FRCBD, to cope with the interaction between the predetermined
Position ID Position Index Length ID Pattern length
factors [12]. This experiment was contemplated while building
P1 0 L1 500 b.p
the Bioinformatics library. The question then what is the most P2 286170 L2 1500 b.p
suitable exact string-matching algorithm for invoking biological P3 1200000 L3 4000 b.p
data. Due to time constraints at that period, the Boyer–Moore P4 2200000 L4 8000 b.p
algorithm was implemented and that it was not possible to make
Software and operating system
a comparison between the algorithms. Strategically, the roadmap
for this study was to assess the productivity and effectiveness of The benchmarks have all been performed on Intel (R) Core (TM)
some exact-string matching algorithms from a time-consuming i5-3470 CPU; 3.20 GHz and 16 GB DDR3 RAM. The operating
outlook. For reliability, four independent factors were adapted, system used for the benchmarks was Microsoft Windows 10 64-
namely the length and the index of the pattern, the algorithm type bit. Two compilers were recruited to measure the anticipated
and the programing language. To estimate the anticipated effect effect of the programing language on algorithm execution time.
of those factors on the performance of each algorithm separately The C# from Microsoft using NET Framework 4.5.2 and JAVA(™)
under the same experimental conditions [13]. Finally, the resulted SE Development Kit 11.0.9 (JDK 11.0.9) from Oracle Corporation
data interpreted statistically in an accurate and unbiased manner. [17].
More specifically, in the current study six hypotheses were
examined: (a) Hypothesis 1: All the algorithms have equal run Measurements
time on average, (b) Hypothesis 2: pattern length does not affect Bio-statistical experimental design: The study was established
searching speed on average. (c) Hypothesis 3: Algorithm type and in a Factorial randomized complete block design fashion. All
pattern length are independent or the real impact of interaction algorithms were subjected to uniform conditions in terms of the
is not prevalent, (d) Hypothesis 4: pattern position does not affect input sequence, the patterns, and the hardware [18]. The design
searching speed on average, (e) Hypothesis 5: Algorithm type and of the experiment encompassed four independent factors: the
pattern position are independent or the real impact of interaction programing language (two levels; C# or JAVA); the exact string-
is not prevalent, (f) Hypothesis 6: programing language does not matching algorithm used to detect search pattern (ten levels;
affect searching speed on average. The main goal of this paper Brute Force, Backward-Oracle-Matching, Raita, Horspool’s, Rabin
is to evaluate the performance of some exact-string matching Karp, Berry-Ravindran, Zhu-Takaoka, Simon, Maximal-Shift or
algorithms in terms of processing time, as well as to measure Two-Way Algorithm); the pattern position (four index levels;
the impact of some factors that may affect the search result on a P1=0, P2=286170, P3=1200000 or P4=2200000); and pattern
limited alphabet used to encode Deoxyribonucleic Acid [14]. length (four levels; L1=500 b.p, L2=1500 b.p, L3=4000 b.p or
L4=8000 b.p) resulting in a total of three hundred and twenty
possible treatment combinations (N=320). Execution time for
each algorithm measured in nanoseconds was assigned as the
dependent response variable (Figure 1).

2 This article is available in: http://colorectal-cancer.imedpub.com/archive.php


2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

been achieved by a factorial RCBD experiment and subjected to


statistical factorial analysis to measure the main effects of the
four independent factors [21].

Effect of algorithm type on execution time


The descriptive statistics consorted with confidence intervals
(CI 95%) across the ten algorithm type groups are proclaimed
(Table 2). As depicted the Backward-Oracle-Matching algorithm
was linked with the numerically least mean level of execution
time confidence (M=24253831) and the Berry-Ravindran
algorithm was associated with the numerically most mean level
of execution time confidence (M=65723113). To test Hypothesis
Figure 1 Flowchart of experiment workflow. 1: All the algorithms have equal run time on average; a between-
groups ANOVA was performed. Preliminarily, the normality
Statistical analyses of the data: All statistics were performed of data distribution must be fulfilled to conduct the ANOVA, a
using the IBM SPSS 22 statistical package. To eliminate system Kolmogorov-Smirnov test (Table 3) indicates that the means
interruptions all unnecessary running processes were halted, the on trial follow a normal distribution, D (320)=0.145, p=.200.
same empirical restrictions were applied to all algorithms [19]. The assumption of homogeneity of variances was measured
The data normality was tested using the Kolmogorov-Smirnov and fulfilled on Levene's F test F (9,310)=2.73, p=.004. The
and Shapiro-Wilk tests. Means were compared using a one-way independency between-groups ANOVA produced a statistically
Analysis of Variance (ANOVA) and two-way ANOVA. Finally, a post- significant effect, F (9,310)=2.60, p=.007, η_p^2= .007. Thus, all
hoc test was conducted by the Games-Howell test (Figure 2). relevant conclusions are in favor of rejecting the H0 for Hypothesis
1. The performance affected by algorithm type and 7% of the
variance in execution time was considered for by algorithm type
membership (Figure 3). To assess the nature of the variances
between the means supplementary, the statistically significant
ANOVA was followed-up with by the Games-Howell post-hoc tests.
In order to check for individual difference between algorithm
types post-hoc comparison using the Games-Howell test was
selected. The results reveal that the mean score for the Brute
Force algorithm (M=40777000, SD=9102098) was significantly
diverged from Horspool’s (M=32954493, SD=7978283), Rabin-
Karp (M=65079287, SD=12222770), Backward-Oracle-Matching
(M=24253831, SD=16582280) and Simon (M=51696978,
SD=14748130) algorithm. The Horspool’s (M=32954493,
SD=7978283) was significantly diverged from Rabin-Karp
Figure 2 Flowchart of experiment statistical analyses. (M=65079287, SD=12222770) and Simon (M=51696978,
SD=14748130) algorithm [22]. The Zhu-Takaoka (M=31234328,
SD=13230735) was significantly diverged from Rabin-Karp
Results (M=65079287, SD=12222770) and Simon (M=51696978,
SD=14748130) algorithm. The Raita was significantly diverged
Interpretation of SPSS results from Rabin-Karp (M=65079287, SD=12222770) and Simon
ANOVA table was determined by SPSS visualizing the p-value for (M=51696978, SD=14748130) algorithm. The Rabin-Karp
the main effects (algorithm type, pattern position, pattern length, (M=65079287, SD=12222770) was significantly diverged from
and programing language) and their interactions. An appropriate Maximal-Shift (M=21891320, SD=42857463), Backward-
95% Confidence Interval (CI) was given. A p-value of less than Oracle-Matching (M=24253831, SD=16582280) and Simon
.05 implies a statistically significant main effect or effect of the (M=51696978, SD=14748130) algorithm. The Backward-Oracle-
interaction [20]. Matching (M=24253831, SD=16582280) was significantly
diverged from Simon (M=51696978, SD=14748130) algorithm.
Preliminaries The mean difference was significant at the .05 level. However, no
This study intended to conduct a robust experiment, to assess the significant difference reported between other group members
effectiveness of several exact-string matching algorithms under [23].
distinct variables. Ten exact-string matching algorithms were
subject to unbiased tests. The data reported in this study have

© Under License of Creative Commons Attribution 3.0 License 3


2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

Table 2: Performance results of tested algorithms.


95%
Confidence
Minimum
N Mean Std. Deviation Std. Error interval for Maximum P4
mean
Lower bound Upper bound
BF(Backward
32 40777000 9102097.6 1609038.7 37495344 44058656 26000000 73000000
Forward)
HP(Horspool) 32 32954494 7978283.5 1410374.6 30078016 35830972 22000000 57000000
BR(Break Key) 32 65723113 138603343 24501841 15751279 115694947 27000000 8.00E+08
TW(Term Work) 32 55382887 69492399 12284637 30328206 80437569 29000000 4.00E+08
ZT(ZhuTakaoka) 32 31234328 13230735 2338885.7 26464139 36004517 21000000 81000000
RT(Run Time) 34778041 11037655 1951200.2 30798542 38757540 18000000 66000000
RK(Radial
32 65079287 12222771 2160701 60672509 69486066 24000000 77055899
Keratotomy)
MS(Multiple
32 42857463 21891320 3869875.1 34964800 50750125 13007900 1.00E+08
Sclerosis)
BOM(Backward
32 24253831 16582280 2931360.7 18275282 30232381 12481500 82000000
Oracle Matching)
SMN 32 51696978 14748130 2607125.6 46379710 57014246 37000000 1.00E+08
Total 320 44473742 51732458 2891932.3 38784072 50163412 12481500 8.00E+08

Table 3: Tests of normality.


Kolmogorov-Smirnov Shapiro-Wilk
Statistic Df Sig. Statistic Df Sig.
Duration 0.145 320 .200* 0.946 320 0.427

Figure 3 Performance for different algorithm types.

Correlation of algorithm types and pattern


length
The experiment error rates have been forwarded to a two-way
ANOVA with four levels of pattern lengths (L1, L2, L3, and L4)
and ten levels of algorithm type. The outcomes designate that
the ramifications of the algorithmic type performed a significant
effect in the number of errors among suggested patterns, the
results elucidated that 7.8% of variances in algorithms execution
time was explicated by algorithm types (F (9,280)=2.62, p<.006,
η_p2=.078). The encouragement of pattern length, responsible Figure 4 Impact of pattern length on algorithm run time.
only for 1% of variances in algorithms runtime as the result
revealed (F (3,280)=0.85, p=.47, partial η_p2 =.01). The cross- Correlation of algorithm types and pattern
action between algorithm type and pattern length scores 10% position
of variance, (F (27,280)=1.10, p=.34, η_p2 =.10). All relevant
outcomes are in favor of accepting the H0, for Hypothesis 2 and A two-way analysis of variance discloses that the pattern position
Hypothesis 3. The pattern length does not affect the speed of the was statistically insignificant at P>.05. The effect of algorithmic
algorithm (Figure 4) and they are independent (Table 4) [24]. type demonstrating that 8% of the variance in the algorithm

4 This article is available in: http://colorectal-cancer.imedpub.com/archive.php


2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

execution time was clarified by algorithmic type (F (9,280)=2.56, between algorithm type and programing language. The main
p=.008, η_p2 =.08). The main impact of pattern position yielded effect for the programing language yielded an F ratio of F (1,
an effect size of .01, divulge that the pattern location behind 1% 300)=5.55, p=.02, indicating a significant difference between C#
of the variance in the algorithm execution time (F (3,280)=.73, programing language (M=37895571, SD=16164128) and JAVA
p=.53, η_p2 =.01). The interaction effect between the two factors programing language (M=51051913, SD=70858720). The main
was highly insignificant (F (27,280)=.84, p=.70, η_p2 =.08), effect for algorithm type yielded an F ratio of F (9, 300)=2.67,
indicate that no significant combined effect was observed for p=.005, indicating that the effect for algorithm type was statically
algorithmic type and pattern position on algorithm execution significant, Brute Force (M=40777000.03, SD=9102097.57),
time, responsible only for 8% of the variance [25]. All relevant Backward-Oracle-Matching (M=24253831.31, SD=16582280.02),
outcomes are in favor of accepting the H0, for Hypothesis 4 and Raita (M=34778040.63, SD=11037654.94), Horspool’s
Hypothesis 5. The pattern position does not affect the speed of (M=32954493.69, SD=7978283.54), Rabin Karp (M=65079287.44,
the algorithm (Figure 5) and they are independent (Table 5). SD=12222770.78), Berry-Ravindran (M=65723112.56,
SD=138603342.72), Zhu-Takaoka (M=31234328.09,
Correlation of algorithm types and programing SD=13230735.42), Simon (M=51696978.19, SD=14748129.57),
language Maximal-Shift (M=42857462.50, SD=21891319.54) and Two-Way
(M=55382887.44, SD=69492398.66). The interaction effect was
A two-way analysis of variance was conducted on the supremacy
insignificant, F (9, 300)=1.43, p=.041. All relevant outcomes are
of two independent variables (algorithm type and programing
in favor of rejecting the H0 and accepting H1 for Hypothesis 6.
language) on the execution time [26]. The algorithm type
The type of programing language has an impact on the execution
included ten levels (Brute Force, Backward-Oracle-Matching,
time of the algorithm (Figure 6) without observed significant
Raita, Horspool’s, Rabin Karp, Berry-Ravindran, Zhu-Takaoka,
interaction between algorithm type and programing language
Simon and Maximal-Shift, Two-Way) and programing language
(Table 6).
consisted of two levels (C# and JAVA). All factors were statistically
significant at the .05 significance level, except for the interaction
Table 4: Independent variable: Algorithm type and pattern length.
Type III sum of
Source df Mean square F Sig. Partial Eta squared
squares
Algorithm 5.998E+16 9 6.665E+15 2.622 0.006 0.078
Length 6.51E+15 3 2.17E+15 0.854 0.466 0.009
algorithm × Length 7.555E+16 27 2.798E+15 1.101 0.338 0.096
Error 7.117E+17 280 2.542E+15
Total 1.487E+18 320
Corrected Total 8.537E+17 319
Note: Intel Squared=.166 (Adjusted R Squared=.050)

Figure 5 Impact of pattern position on algorithm run time.

Table 5: Independent variable: Algorithm type and pattern position.


Type III sum of
Source df Mean square F Sig. Partial Eta squared
squares
algorithm 5.998E+16 9 6.665E+15 2.56 0.008 0.076

© Under License of Creative Commons Attribution 3.0 License 5


2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

Figure 6 Impact of programing language and algorithm type.

Table 6: Independent variable: Algorithm type and programing language.


Type III sum of
Source df Mean square F Sig. Partial Eta squared
squares
algorithm 5.998E+16 9 6.665E+15 2.56 0.008 0.076
Table 7: Independent variable: Algorithm type and programing language.
Type III sum of
Source df Mean square F Sig. Partial Eta squared
squares
Programing
1.385E+16 1 1.385E+16 5.554 0.019 0.018
language
Algorithm 5.998E+16 9 6.665E+15 2.673 0.005 0.074
Programing
language × 3.2E+16 9 3.556E+15 1.426 0.176 0.041
algorithm
Error 7.479E+17 300 2.493E+15
Total 1.487E+18 320
Corrected Total 8.537E+17 319
Note: Intel Squared=.124 (Adjusted R Squared=.068)

Discussions and Thierry addressed the exact string-matching problem in


an elaborate experiment; their results reveal that for various
This study approached the problem from the exact-string matching alphabet sizes and pattern lengths the efficiency of algorithms
factor of perspective. To make a definitive distinction between is quite diverse [29]. Furthermore, AbdulRazzaq concluded
the productivity of frequently accepted exact string-matching the impact of algorithm architecture was the cornerstone
algorithms on nucleotide alphabet. Essentially, throughout that affected the performance of some exact string-matching
molecular investigations, scanning for oligonucleotides patterns algorithms. It is obvious from the related literature reviews,
was considered a commonly performed task. DNA antisense, that the algorithm architecture appears to have an influential
microarray, gene cloning, and polymerase chain reaction analyses role in performance at the time of implementation. The awaited
all need to be performed a string-matching in one form or another. significance of algorithm design was formulated as a Hypothesis
Constructing an application based on reliability, productivity, 1 in this study. The best performance in the current study was
and suitability for genomic sequences requires distinctiveness scored by the Backward-Oracle-Matching algorithm (Figure 7).
between available algorithms and selecting the best. It’s an automaton on a word p, the sequence of letters taken in an
alphabet Σ, that combination called factor oracle. The Two-Way
The outcome of algorithm design on runtime algorithm ranked second in terms of performance. It’s a variant
The influence of algorithm architectural design on the runtime of the Boyer-Moors algorithm. The rest of the algorithms have
of exact-string matching algorithms was widely discussed in prior a fairly close performance, except for both the Berry-Ravindran
studies [27]. According to Christian and others reported that and the Rabin-Karp respectively had a poor performance [30].
the Boyer-Moore-Horspool algorithm which does preprocess The results recorded in this experiment are consistent with past
on search patterns performed better on short patterns than the findings, which prove the existence of an effect of algorithm
naïve algorithm which lacks the preprocessing step [28]. Simone design on runtime.

Figure 7 Run time for evaluated algorithms.

6 This article is available in: http://colorectal-cancer.imedpub.com/archive.php


2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

The outcome of the programming languages on 6 feng HX, Yubao YY, Lu X (2010) Hybrid Pattern-Matching
Algorithm based on BM-KMP Algorithm. International
runtime Conference on Advanced Computer Theory and Engineering
To our understanding, this is the first study to evaluate the effect of 8 :305-310.
the programming language type on run time for the exact string- 7 Cao Z, Yan Z, Liu L (2015) A Fast String Matching Algorithm
matching algorithms. A hypothesis was verbalized, Hypothesis based on Lowlight Characters in the Pattern. International
6. To estimate the probable consequence of a programming Conference on Advanced Computational Intelligence (ICACI)
language type on the algorithm's run time. The results gained 27: 179-182
were in courtesy of accepting the alternative hypothesis. The C#
provides better performance than JAVA superior by 75%. 8 Hakak S, Kamsin A, Shivakumara P, Idris MYI, Gilkar GA (2018)
A new split based searching for exact pattern matching for
Challenges natural texts. 13: 24-26
Monitoring the efficiency of exact string-matching algorithms 9 Krallinger M, Valencia A, Hirschman L (2008) Linking genes to
in terms of performed tasks (e.g. palindrome sequence, and literature: text mining, information extraction, and retrieval
fingerprint detection) and categorizing them by productivity applications for biology. Genome Biol 9: 1-4.
rather than the methodologies used are challenging. However,
10 Allmer J (2017) Exact pattern matching: Adapting the Boyer-
putting the focus on particular tasks assists the researcher
Moore algorithm for DNA searches. PeerJ PrePrints.
to improve or implement only specific algorithms instead of
randomly selecting the algorithms. 11 Berry T, Ravindran S (1999) A Fast String Matching Algorithm
and Experimental Results. In Stringology 19: 16-28.
Conclusions 12 Washietl S (2005) Prediction of structural non-coding RNAs by
In this study, the fastest algorithms were Backward-Oracle- comparative sequence analysis.
Matching, Zhu-Takaoka, and Horspool’s respectively. The
13 AL970861RF MC, Perrin D (1991) Two-way string-matching. J
architecture of the algorithm plays a critical role in the
Assoc Comput 38: 651-675.
performance. Moreover, the C# programming language provided
an outstanding performance superior to the Java language and 14 Deighton RA. Using Rabin-Karp fingerprints and Level DB for
verified that the programming language has an effective role in faster searches.
the run time of the algorithms under trial. No pattern-related 15 Frakes WB, Yates RB (1992) Information retrieval: Data
influence has been shown, either on the length of the pattern structures and algorithms. Prentice-Hall.
or on its positioning on the target text, as contrasted to any
previous studies that indicate the remarkable effect of this factor. 16 Knuth DE, Morris, Jr JH, Pratt VR (1977) Fast pattern matching
Finally, we strongly recommended adding new algorithms to in strings. J Comput 6: 323-350.
evaluate their performance. Additionally, expanding the scope 17 Michailidis PD, Margaritis KG (2002) On-line approximate
of the possible factors that may interfere with the performance string searching algorithms: Survey and experimental results.
of algorithms run time, such as the operating system and the Int J Comput Math 79: 8678-88.
alphabet in future studies.
18 Morris Jr J, Pratt V (1970) A linear pattern-matching algorithm.
References 19 Mozgovoy M (2007) Enhancing computer-aided plagiarism
1 Abbott A, Tsay A (2000) Sequence analysis and optimal detection. Joensuun yliopisto.
matching methods in sociology. Sociol Methods Res 29: 3-33. 20 Naser MA, Rashid NA, Aboalmaaly MF (2012) Quick-skip
2 Razzaq AA, Rashid NA, Hasan AA, Hashem MA (2013) The search hybrid algorithm for the exact string matching problem.
exact string matching algorithms efficiency review. Glob J Int J Comput Theory Eng 4:259-262.
Technol 9: 12-18. 21 Raita T (1992) Tuning the boyer-moore-horspool string
3 Allauzen C, Crochemore M, Raffinot M (1999) Factor Oracle: A searching algorithm. Software: Practice and Experience
New Structure for Pattern Matching. International Conference 10:879-884.
on Current Trends in Theory and Practice of Computer Science 22 Rasool A, Tiwari A, Singla G, Khare N (2012) String matching
27: 295-310. methodologies: A comparative analysis. 11: 30-40.
4 Zhang C, Pang J (2012) An Algorithm for Probabilistic 23 Sahota V, Li M, Bayford R (2013) MPS: improving exact string
Alternating Simulation. International Conference on Current matching through pattern character frequency. J Data Process
Trends in Theory and Practice of Computer Science 21: 431- 3: 127-129.
442.
24 Sheik SS, Aggarwal SK, Poddar A, Sathiyabhama B, Balakrishnan
5 Boyer RS, Moore JS (1977) A Fast String Searching Algorithm. N et al. (2005) Analysis of string-searching algorithms on
Communications of the ACM 20: 762-772. biological sequence databases. J Curr Sci 25:368-374.

© Under License of Creative Commons Attribution 3.0 License 7


2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112

25 Simon I (1994) String matching algorithms and automata. 28 Sunday DM (1990) A very fast substring search algorithm.
Trends Theor Comput Sci 23: 386-395. Commun ACM 33:132-142.
26 Lovis C, Baud RH (2000) Fast exact string pattern-matching 29 Wu S, Manber U (1992) Agrep–A Fast Approximate Pattern-
algorithms adapted to the characteristics of the medical Matching Tool. In Usenix Winter Technical Conference 24:
language. J Am Med Inform Assoc 7: 378-391. 153-162.
27 Faro S, Lecroq T (2010) The exact string matching problem: a 30 Zaki MJ (2001) SPADE: An efficient algorithm for mining
comprehensive experimental evaluation. frequent sequences. Mach Learn 42: 31-60.

8 This article is available in: http://colorectal-cancer.imedpub.com/archive.php

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy