Identification of Distinguishing Motifs

Feng, WangSen; Wang, Zhanyong; Wang, Lusheng

doi:10.1007/978-3-540-73437-6_26

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

738 Accesses
1 Citations

Abstract

Motivation: Motif identification for sequences has many important applications in biological studies, e.g., diagnostic probe design, locating binding sites and regulatory signals, and potential drug target identification. There are two versions.

1
Single Group: Given a group of n sequences, find a length-l motif that appears in each of the given sequences and those occurrences of the motif are similar.
1
Two Groups: Given two groups of sequences B and G, find a length-l (distinguishing) motif that appears in every sequence in B and does not appear in anywhere of the sequences in G.

Here the occurrences of the motif in the given sequences have errors. Currently, most of existing programs can only handle the case of single group. Moreover, it is very difficult to use edit distance (allowing indels and replacements) for motif detection.

Results: (1) We propose a randomized algorithm for the one group problem that can handle indels in the occurrences of the motif. (2) We give an algorithm for the two groups problem. (3) Extensive simulations have been done to evaluate the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Novel algorithms for LDD motif search

Article Open access 06 June 2019

On Multiple Longest Common Subsequence and Common Motifs with Gaps (Extended Abstract)

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Article Open access 05 January 2017

References

Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (ISMB-1994), pp. 28–36. AAAI Press, Menlo PArk (1994)
Google Scholar
Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Google Scholar
Buhler, J., Tompa, M.: Finding motifs using random projections. Journal of Computational Biology 9, 225–242 (2002)
Article Google Scholar
Cardon, L.R., Stormo, G.D.: Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. J. Mol. Biol. 223, 159–170 (1992)
Article Google Scholar
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Generic Drug Design without Side Effect. SIAM J on Computing 32(4), 1073–1090 (2003)
Article MATH MathSciNet Google Scholar
Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
Hertz, G., Stormo, G.: Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proc. 3rd Intl Conf. Bioinformatics and Genome Research, pp. 201–216 (1995)
Google Scholar
Hu, Y.-J.H: Finding subtle motifs with variable gaps in unaligned DNA sequences. Computer Methods and Programs in Biomedicine 70, 11–20 (2003)
Article Google Scholar
Keich, U., Pevzner, P.: Finding motifs in the twilight zone. Bioinformatics 18, 1374–1381 (2002a)
Article Google Scholar
Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002b)
Article Google Scholar
Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. 633–642 (Also to appear in Information and Computation)
Google Scholar
Lawrence, C., Reilly, A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51 (1990)
Article Google Scholar
Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Strings. In: Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, Atlanta, pp. 473–482 (1999)
Google Scholar
Li, M., Ma, B., Wang, L.: Finding Similar Regions in Many Sequences (special issue for Thirty-first Annual ACM Symposium on Theory of Computing). J. Comput. Syst. Sci. 65, 73–96 (2002a)
Article MathSciNet Google Scholar
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. JACM 49(2), 157–171 (2002b)
Article MathSciNet Google Scholar
Lucas, K., Busch, M., Mössinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)
Google Scholar
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. pp. 269–278 (2000)
Google Scholar
Proutski, V., Holme, E.C.: Primer Master: a new program for the design and analysis of PCR primers. CABIOS 12, 253–255 (1996)
Google Scholar
Stormo, G.: Consensus patterns in DNA. In: Doolittle, R.F.(ed.) Molecular evolution: computer analysis of protein and nucleic acid sequences, Methods in Enzymology, vol. 183, pp. 211–221 (1990)
Google Scholar
Price, A., Ramabhadran, S., Pevzner, P.: Finding Subtle Motifs by Branching from Sample Strings, Bioinformatics 19, 149–155 (2003)
Article Google Scholar
Keller, G.H., Manak, M.M.: DNA Probes, Stockton Press, p. 12 (1989)
Google Scholar
McPearson, M.J., Quirke, M.J., Taylor, G.R: PCR A Practical Approach, p. 8. Oxford University Press, New York (1991)
Google Scholar
Wang, L., Dong, L., Fan, H.: Randomized Algorithms for Motif Detection. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 884–895. Springer, Heidelberg (2004)
Google Scholar
Waterman, M., Arratia, R., Galas, E.: Pattern recognition in several sequences:consenus and alignment. Bull. Math. Biol. 46, 515–527 (1984)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Peking University, People’s Republic of China
WangSen Feng
Department of Computer Science, City University of Hong Kong, Hong Kong,
Zhanyong Wang & Lusheng Wang

Authors

WangSen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zhanyong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lusheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, W., Wang, Z., Wang, L. (2007). Identification of Distinguishing Motifs. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-73437-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identification of Distinguishing Motifs

Abstract

Access this chapter

Preview

Similar content being viewed by others

Novel algorithms for LDD motif search

On Multiple Longest Common Subsequence and Common Motifs with Gaps (Extended Abstract)

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Identification of Distinguishing Motifs

Abstract

Access this chapter

Preview

Similar content being viewed by others

Novel algorithms for LDD motif search

On Multiple Longest Common Subsequence and Common Motifs with Gaps (Extended Abstract)

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.