Abstract
Bioinformatics has grown considerably since the development of the first sequencing machine, being today intensively used with the next generation DNA sequencers. Viral genomes represent a great challenge to bioinformatics due to its high mutation rate, forming quasispecies in the same infected host. In this paper, we implement and evaluate the performance of a genetic algorithm, named GAVGA, through the quality of a viral genome assembly. The assembly process works by first clustering the reads that share a common substring called seed and for each cluster, checks if there are overlapping reads with a given similarity percentage using a genetic algorithm. The assembled data are then compared to Newbler, SPAdes and ABySS assemblers, and also to a viral assembler such as VICUNA, which confirms the feasibility of our approach. GAVGA was implemented in python 2.7+ and can be downloaded at https://sourceforge.net/projects/gavga-assembler/.
Similar content being viewed by others
References
Nagarajan, N., Pop, M.: Sequence assembly demystified. Nat. Rev. Genet. 14(3), 157–167 (2013)
Palmer, L.E., Dejori, M., Bolanos, R., Fasulo, D.: Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction. BMC Bioinform. 11(1), 33 (2010)
Goés, F., Alves, R., Corrêa, L., Chaparro, C., Thom, L.: Towards an ensemble learning strategy for metagenomic gene prediction. In: Campos, S. (ed.) BSB 2014. LNCS, vol. 8826, pp. 17–24. Springer, Cham (2014). doi:10.1007/978-3-319-12418-6_3
Peltola, H., Söderlund, H., Tarhio, J., Ukkonen, E.: Algorithms for some string matching problems arising in molecular genetics. In: IFIP Congress, pp. 59–64 (1983)
Henn, M.R., Boutwell, C.L., Charlebois, P., Lennon, N.J., Power, K.A., Macalalad, A.R., Zody, M.C.: Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 8(3), e1002529 (2012)
Yang, X., Charlebois, P., Gnerre, S., Coole, M.G., Lennon, N.J., Levin, J.Z., Henn, M.R.: De novo assembly of highly diverse viral populations. BMC Genom. 13(1), 475 (2012)
Hunt, M., Gall, A., Ong, S.H., Brener, J., Ferns, B., Goulder, P., Otto, T.D.: IVA: accurate de novo assembly of RNA virus genomes. Bioinformatics 31(14), 2374–2376 (2015)
Ruby, J.G., Bellare, P., DeRisi, J.L.: PRICE: software for the targeted assembly of components of (Meta) genomic sequence data. G3: Genes Genomes Genet. 3(5), 865–880 (2013)
Yamashita, A., Sekizuka, T., Kuroda, M.: VirusTAP: Viral genome-targeted assembly pipeline. Frontiers in microbiology 7, 32 (2016)
Gusfield, D. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge university press (1997)
Szwarcfiter, J.L.: Grafos e algoritimos computacionais, 2nd edn. Rio de Janeiro (1998)
Hoffman, A.J., Wolfe, J., Garfinkel, R.S., Johnson, D.S., Papadimitriou, C.H., Gilmore, P.C., Golden, B.L.: The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, Hoboken (1986)
Bertossi, A.A.: The edge Hamiltonian path problem is NP-complete. Inf. Process. Lett. 13(4–5), 157–159 (1981)
Héam, P.C., Hugot, V., Kouchnarenko, O.: The emptiness problem for tree automata with at least one global disequality constraint is NP-hard. Inf. Process. Lett. 118, 6–9 (2017)
Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991)
NCBI Illumina MiSeq paired end sequencing of Human Immunodeficiency Virus 1. http://www.ncbi.nlm.nih.gov/sra/?term=ERR846528. Accessed 15 Apr 2017
Sherry, S.: NCBI SRA Toolkit technology for next generation sequence data. In: Plant and Animal Genome XX Conference (2012)
Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30(5), 614–620 (2014)
Chu, J., Sadeghi, S., Raymond, A., Jackman, S.D., Nip, K.M., Mar, R., Birol, I.: BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics 30(23), 3402–3404 (2014)
Schmieder, R., Edwards, R.: Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6), 863–864 (2011)
Vavak, F., Fogarty, T.C.: Comparison of steady state and generational genetic algorithms for use in nonstationary environments. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 192–195. IEEE (1996)
Rechenberg, I.: Evolution Strategy: Optimization of Technical systems by means of biological evolution. Fromman-Holzboog, Stuttgart, 104 (1973)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Parsons, R.J., Forrest, S., Burks, C.: Genetic algorithms, operators, and DNA fragment assembly. Mach. Learn. 21(1), 11–33 (1995)
Fraga, J.S. Algoritmos genéticos e o problema de montagem de reads (Master’s thesis) (2014)
Kikuchi, S., Chakraborty, G.: Heuristically tuned GA to solve genome fragment assembly problem. In: IEEE Congress on Evolutionary Computation, CEC 2006. pp. 1491–1498. IEEE (2006)
Indumathy, R., Maheswari, S.U.: Nature inspired algorithms to solve DNA fragment assembly problem: a survey. Int. J. Bioinform. Res. Appl. 2(2), 45–50 (2012)
Hughes, J., Houghten, S., Mallen-Fullerton, G. M., Ashlock, D.: Recentering and restarting genetic algorithm variations for dna fragment assembly. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)
Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2012)
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Pyshkin, A.V.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Oliveira, R.R.M. et al. (2017). GAVGA: A Genetic Algorithm for Viral Genome Assembly. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-65340-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65339-6
Online ISBN: 978-3-319-65340-2
eBook Packages: Computer ScienceComputer Science (R0)