Abstract
We present a mathematical framework for anchoring inglobal multiple alignment. Our framework uses anchors that are hits to spaced seeds and identifies anchors progressively, using a phylogenetic tree. We compute anchors in the tree starting at the root and going to the leaves, and from the leaves going up. In both cases, we compute thresholds for anchors to minimize errors. One innovative aspect of our approach is the approximate inference of ancestral sequences with accomodation for ambiguity. This, combined with proper scoring techniques and seeding, lets us pick many anchors in homologous positions as we align up a phylogenetic tree, minimizing total work. Our algorithm is reasonably successful in simulations, is comparable to existing software in terms of accuracy and substantially more efficient.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blanchette, M., Kent, W.J., Riemer, C., et al.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)
Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–699 (2004)
Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1, 595–610 (2004)
Brown, D.: Multiple vector seeds for protein alignment. In: These proceedings
Brudno, M., Chapman, M., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinf. 4, 66 (2003)
Brudno, M., Do, C., Cooper, G., Kim, M., et al.: LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)
Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of CSB 2002, pp. 138–147 (2002)
Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082 (1988)
Eppstein, D., Giancarlo, R., Galil, Z., Italiano, G.F.: Sparse dynamic programming. I: Linear cost functions; II: Convex and concave cost functions. J. ACMÂ 39 (1992)
Feller, W.: An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York (1957)
Fitch, W.M.: Toward defining the course of evolution: minimum change for a specified tree topology. Syst. Zool. 20, 406–416 (1971)
Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinf. 18, 312–320 (2002)
Kececioglu, J.D., Zhang, W.: Aligning alignments. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 189–208. Springer, Heidelberg (1998)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 138, 253–263 (2004)
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinf. and Comp. Biol. (2004) (to appear)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinf. 18, 440–445 (2002)
Ma, B., Wang, Z., Zhang, K.: Alignment between two multiple alignments. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 254–265. Springer, Heidelberg (2003)
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. 93, 12098–12103 (1996)
Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680 (1994)
Zhang, Y., Waterman, M.: An eulerian path approach to global multiple alignment for DNA sequences. J. Comp. Biol. 10, 803–819 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brown, D.G., Hudek, A.K. (2004). New Algorithms for Multiple DNA Sequence Alignment. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-30219-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive