Evaluating Efficiency of Some Exact Stri
Evaluating Efficiency of Some Exact Stri
Received: September 24, 2021; Accepted: October 08, 2021; Published: October 15, 2021
Introduction equate the pattern characters with the characters in the target
text. This implementation was offered by the so-called Naïve
Admittedly, string-matching is an essential problem-solving or Brute-Force algorithm [5]. Text or pattern has not been pre-
technique, encountered by specialists from various disciplines e.g. processed by this algorithm. Its time complexity in the worst
Data mining, artificial intelligence, and Bioinformatics [1]. Oodles case is O (nm), where m and n apply to pattern and text length
of algorithms and methods have been announced for pattern correspondingly. Subsequently, numerous algorithms have made
recognition, and there are abundant applications and online formidable enhancements on Brute-Force time scheming. The
servers that can achieve precise string matching on biological worst-case, lower bound of the string-matching problem is O (n).
data [2]. The methodologies that endorse the recognition of The first algorithm to reach the bound was given by Morris and
the patterns contrast greatly, owing to the obvious variations in Pratt in the early [6] later improved by Knuth. Linear algorithms
algorithmic architecture [3]. Generally, string matching algorithms that are based on bit-parallelism were announced by Baeza-Yates
could be broadly classified into five distinct classes: (a) algorithms and Manber Xian-Feng et al. presented the KMPBS algorithm, a
used to resolve the problem by character comparisons, (b) hybrids algorithm based on Boyer–Moore (BM) and The Knuth-
algorithms that depend on the use of automatic probabilistic Morris-Pratt (KMP) algorithm. The text T is scanned from left to
simulation, (c) non-probabilistic simulation algorithms, (d) right for the given pattern P of length m. When searching, the very
constant-space algorithms, and (e) real-time algorithms [4]. last character of P is compared to the corresponding character
The more traditional and the humblest match approach are to
© Under License of Creative Commons Attribution 3.0 License | This article is available in: http://colorectal-cancer.imedpub.com/archive.php 1
2021
American Journal of Computer
Science and Information Technology Vol. 9 No. 9: 112
execution time was clarified by algorithmic type (F (9,280)=2.56, between algorithm type and programing language. The main
p=.008, η_p2 =.08). The main impact of pattern position yielded effect for the programing language yielded an F ratio of F (1,
an effect size of .01, divulge that the pattern location behind 1% 300)=5.55, p=.02, indicating a significant difference between C#
of the variance in the algorithm execution time (F (3,280)=.73, programing language (M=37895571, SD=16164128) and JAVA
p=.53, η_p2 =.01). The interaction effect between the two factors programing language (M=51051913, SD=70858720). The main
was highly insignificant (F (27,280)=.84, p=.70, η_p2 =.08), effect for algorithm type yielded an F ratio of F (9, 300)=2.67,
indicate that no significant combined effect was observed for p=.005, indicating that the effect for algorithm type was statically
algorithmic type and pattern position on algorithm execution significant, Brute Force (M=40777000.03, SD=9102097.57),
time, responsible only for 8% of the variance [25]. All relevant Backward-Oracle-Matching (M=24253831.31, SD=16582280.02),
outcomes are in favor of accepting the H0, for Hypothesis 4 and Raita (M=34778040.63, SD=11037654.94), Horspool’s
Hypothesis 5. The pattern position does not affect the speed of (M=32954493.69, SD=7978283.54), Rabin Karp (M=65079287.44,
the algorithm (Figure 5) and they are independent (Table 5). SD=12222770.78), Berry-Ravindran (M=65723112.56,
SD=138603342.72), Zhu-Takaoka (M=31234328.09,
Correlation of algorithm types and programing SD=13230735.42), Simon (M=51696978.19, SD=14748129.57),
language Maximal-Shift (M=42857462.50, SD=21891319.54) and Two-Way
(M=55382887.44, SD=69492398.66). The interaction effect was
A two-way analysis of variance was conducted on the supremacy
insignificant, F (9, 300)=1.43, p=.041. All relevant outcomes are
of two independent variables (algorithm type and programing
in favor of rejecting the H0 and accepting H1 for Hypothesis 6.
language) on the execution time [26]. The algorithm type
The type of programing language has an impact on the execution
included ten levels (Brute Force, Backward-Oracle-Matching,
time of the algorithm (Figure 6) without observed significant
Raita, Horspool’s, Rabin Karp, Berry-Ravindran, Zhu-Takaoka,
interaction between algorithm type and programing language
Simon and Maximal-Shift, Two-Way) and programing language
(Table 6).
consisted of two levels (C# and JAVA). All factors were statistically
significant at the .05 significance level, except for the interaction
Table 4: Independent variable: Algorithm type and pattern length.
Type III sum of
Source df Mean square F Sig. Partial Eta squared
squares
Algorithm 5.998E+16 9 6.665E+15 2.622 0.006 0.078
Length 6.51E+15 3 2.17E+15 0.854 0.466 0.009
algorithm × Length 7.555E+16 27 2.798E+15 1.101 0.338 0.096
Error 7.117E+17 280 2.542E+15
Total 1.487E+18 320
Corrected Total 8.537E+17 319
Note: Intel Squared=.166 (Adjusted R Squared=.050)
The outcome of the programming languages on 6 feng HX, Yubao YY, Lu X (2010) Hybrid Pattern-Matching
Algorithm based on BM-KMP Algorithm. International
runtime Conference on Advanced Computer Theory and Engineering
To our understanding, this is the first study to evaluate the effect of 8 :305-310.
the programming language type on run time for the exact string- 7 Cao Z, Yan Z, Liu L (2015) A Fast String Matching Algorithm
matching algorithms. A hypothesis was verbalized, Hypothesis based on Lowlight Characters in the Pattern. International
6. To estimate the probable consequence of a programming Conference on Advanced Computational Intelligence (ICACI)
language type on the algorithm's run time. The results gained 27: 179-182
were in courtesy of accepting the alternative hypothesis. The C#
provides better performance than JAVA superior by 75%. 8 Hakak S, Kamsin A, Shivakumara P, Idris MYI, Gilkar GA (2018)
A new split based searching for exact pattern matching for
Challenges natural texts. 13: 24-26
Monitoring the efficiency of exact string-matching algorithms 9 Krallinger M, Valencia A, Hirschman L (2008) Linking genes to
in terms of performed tasks (e.g. palindrome sequence, and literature: text mining, information extraction, and retrieval
fingerprint detection) and categorizing them by productivity applications for biology. Genome Biol 9: 1-4.
rather than the methodologies used are challenging. However,
10 Allmer J (2017) Exact pattern matching: Adapting the Boyer-
putting the focus on particular tasks assists the researcher
Moore algorithm for DNA searches. PeerJ PrePrints.
to improve or implement only specific algorithms instead of
randomly selecting the algorithms. 11 Berry T, Ravindran S (1999) A Fast String Matching Algorithm
and Experimental Results. In Stringology 19: 16-28.
Conclusions 12 Washietl S (2005) Prediction of structural non-coding RNAs by
In this study, the fastest algorithms were Backward-Oracle- comparative sequence analysis.
Matching, Zhu-Takaoka, and Horspool’s respectively. The
13 AL970861RF MC, Perrin D (1991) Two-way string-matching. J
architecture of the algorithm plays a critical role in the
Assoc Comput 38: 651-675.
performance. Moreover, the C# programming language provided
an outstanding performance superior to the Java language and 14 Deighton RA. Using Rabin-Karp fingerprints and Level DB for
verified that the programming language has an effective role in faster searches.
the run time of the algorithms under trial. No pattern-related 15 Frakes WB, Yates RB (1992) Information retrieval: Data
influence has been shown, either on the length of the pattern structures and algorithms. Prentice-Hall.
or on its positioning on the target text, as contrasted to any
previous studies that indicate the remarkable effect of this factor. 16 Knuth DE, Morris, Jr JH, Pratt VR (1977) Fast pattern matching
Finally, we strongly recommended adding new algorithms to in strings. J Comput 6: 323-350.
evaluate their performance. Additionally, expanding the scope 17 Michailidis PD, Margaritis KG (2002) On-line approximate
of the possible factors that may interfere with the performance string searching algorithms: Survey and experimental results.
of algorithms run time, such as the operating system and the Int J Comput Math 79: 8678-88.
alphabet in future studies.
18 Morris Jr J, Pratt V (1970) A linear pattern-matching algorithm.
References 19 Mozgovoy M (2007) Enhancing computer-aided plagiarism
1 Abbott A, Tsay A (2000) Sequence analysis and optimal detection. Joensuun yliopisto.
matching methods in sociology. Sociol Methods Res 29: 3-33. 20 Naser MA, Rashid NA, Aboalmaaly MF (2012) Quick-skip
2 Razzaq AA, Rashid NA, Hasan AA, Hashem MA (2013) The search hybrid algorithm for the exact string matching problem.
exact string matching algorithms efficiency review. Glob J Int J Comput Theory Eng 4:259-262.
Technol 9: 12-18. 21 Raita T (1992) Tuning the boyer-moore-horspool string
3 Allauzen C, Crochemore M, Raffinot M (1999) Factor Oracle: A searching algorithm. Software: Practice and Experience
New Structure for Pattern Matching. International Conference 10:879-884.
on Current Trends in Theory and Practice of Computer Science 22 Rasool A, Tiwari A, Singla G, Khare N (2012) String matching
27: 295-310. methodologies: A comparative analysis. 11: 30-40.
4 Zhang C, Pang J (2012) An Algorithm for Probabilistic 23 Sahota V, Li M, Bayford R (2013) MPS: improving exact string
Alternating Simulation. International Conference on Current matching through pattern character frequency. J Data Process
Trends in Theory and Practice of Computer Science 21: 431- 3: 127-129.
442.
24 Sheik SS, Aggarwal SK, Poddar A, Sathiyabhama B, Balakrishnan
5 Boyer RS, Moore JS (1977) A Fast String Searching Algorithm. N et al. (2005) Analysis of string-searching algorithms on
Communications of the ACM 20: 762-772. biological sequence databases. J Curr Sci 25:368-374.
25 Simon I (1994) String matching algorithms and automata. 28 Sunday DM (1990) A very fast substring search algorithm.
Trends Theor Comput Sci 23: 386-395. Commun ACM 33:132-142.
26 Lovis C, Baud RH (2000) Fast exact string pattern-matching 29 Wu S, Manber U (1992) Agrep–A Fast Approximate Pattern-
algorithms adapted to the characteristics of the medical Matching Tool. In Usenix Winter Technical Conference 24:
language. J Am Med Inform Assoc 7: 378-391. 153-162.
27 Faro S, Lecroq T (2010) The exact string matching problem: a 30 Zaki MJ (2001) SPADE: An efficient algorithm for mining
comprehensive experimental evaluation. frequent sequences. Mach Learn 42: 31-60.