Abstract
This work presents a method of knowledge discovery in data obtained from Molecular Dynamics Protein Unfolding Simulations. The data under study was obtained from simulations of the unfolding process of the protein Transthyretin (TTR), responsible for amyloid diseases such as Familial Amyloid Polyneuropathy (FAP). Protein unfolding and misfolding are at the source of many amyloidogenic diseases. Thus, the molecular characterization of protein unfolding processes through experimental and simulation methods may be essential in the development of effective treatments. Here, we analyzed the distance variation of each of the 127 amino acids C α (alpha carbon) atoms of TTR to the centre of mass of the protein, along 10 different unfolding simulations - five simulations of WT-TTR and five simulations of L55P-TTR, a highly amyloidogenic TTR variant. Using data mining techniques, and considering all the information of the 10 runs, we identified several clusters of amino acids. For each cluster we selected the representative element and identified events which were used as features. With Association Rules we found patterns that characterize the type of TTR variant under study. These results may help discriminate between amyloidogenic and non-amyloidogenic behaviour among different TTR variants and contribute to the understanding of the molecular mechanisms of FAP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large data bases. In: ACM SIGMOD Intl. Conf. On Management of Data, Washington, USA, ACM Press, New York (1993)
Azevedo, P.J.: CAREN A java based Apriori implementation for classification purposes. Technical Report, Universidade do Minho: Departamento de Informática (2005)
Azevedo, P., Silva, C., Rodrigues, J., Ferreira, N., Brito, R.: Detection of Hydrophobic Clusters in Molecular Dynamics Protein Simulations Using Association Rules. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745, pp. 329–337. Springer, Heidelberg (2005)
Berry, M.J.A., Linoff, G.S.: Mastering Data Minig (2000)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: ACM SIGMOD/PODS 1997, pp. 265–276 (1997)
Brito, R., Dubitzky, W., Rodrigues, R.: Protein Folding and Unfolding Simulations A New Challenge for Data Mining. A Journal of Integrative Biology 8(2), 153–166 (2004)
Fayyad (2), U., Piatetsky-Shapiro, Padhraic, S.: From Data Mining to Knowledge Discovery in Databases. In: Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)
Fayyad, U., Piatetsky-Shapiro, G.: The KDD Process for Extracting Useful knowledge from Volumes of Data. Communications of the ACM 39(11), 27–34 (1996)
Ferreira, P.G., Silva, C., Brito, R., Azevedo, P.J.: A Closer Look on Protein Unfolding Simulations through Hierarchical Clustering. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology - CIBCB, Hawai, USA, pp. 461–468 (2007)
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall /CRC, Boca Raton (1999)
Grubmiller, Helmut.: Solvate 1.0. (1996) (accessed, 2007), www.mpibpc.mog.de/groups/grubmueller/start/software/solvet/docu.html
Guralnik, V., Srivastava, J.: Event Detection from Series Data. In: KDD 1999. Department of Computer Science, University of Minnesota, San Diego (1999)
Hamilton, J., Steinrauf, A., Braden, L.K., Liepnieks, B.C., Benson, J., Holmgren, M.D., Sandgren, G., Steen, O.: The X-ray crystal structure refinements of normal human transthyretin and the amyloidogenic Val-30-Met variant to 1.7 A resolution. J. Biol. Chem. 268, 2416–2424 (2003)
Hennig, C.: Package fpc Version 1.1-1 (accessed on 2007), http://cran.rproject.org/wen/packages/fpc/index.html
Kalé, L., Skeel, R., BBhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., Philips, J., Shinozaki, A., Varadarajan, K., Schulten, K.: NAMD2: Greater Scability for Parallel Molecular Dynamics. Journal of Computational Physics 151, 283–312 (1999)
MacKerell, A.D., Bashford, D., Bellot, M., Dunbrack, R.L., Evanseck, J., Field, M.J.: All-atom empirical potencial for molecular modeling and dynamics studies of proteins. J.Phys. Chem. B 102, 3586–3616 (1998)
Pande, V.S., Baker, I., Chapman, J., Elmer, S.P., Khaliq, S., Larson, S.M., Rhee, Y.M., Shirts, M.R., Snow, C.D., Sorin, E.J., Zagrovic, B.: Atomistic protein Folding Simulations on the Submillisecond Time Scale Using Worldwide Distributed Computing. Biopolymers 68, 91–109 (2003)
Scheraga, H., Khalili, M., Liwo, A.: Protein-Folding Dynamics: Overview of Molecular Simulation Techniques. Annu. Rev. Phys. Chem., 57–83 (2007)
Witten, I., Frank, E.: Data Mining: practical machine learning tools abd techniques with Java implementatons, p. 177. Morgan Kaufman Publishers, San Francisco (1999)
Zhang, L., Hermans, J.: Hydrophilicity of cavities in proteins. Proteins: Structure, Function and Genetics 24, 433–438 (1996)
(accessed, May 4, 2008), http://stat.ethz.ch/R-manual/R-patched/library/stats/html/00Index.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fernandes, E., Jorge, A.M., Silva, C.G., Brito, R.M.M. (2009). A Knowledge Discovery Method for the Characterization of Protein Unfolding Processes. In: Corchado, J.M., De Paz, J.F., Rocha, M.P., Fernández Riverola, F. (eds) 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). Advances in Soft Computing, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85861-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-85861-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85860-7
Online ISBN: 978-3-540-85861-4
eBook Packages: EngineeringEngineering (R0)