Abstract
In this paper, we describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface), based on the identity of the target residue and its ten sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and protease-inhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease-inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to predict residues involved in protein-protein interactions from sequence information alone.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baldi P, Brunak S, Chauvin Y, Andersen CAF (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412-424
Benner SA, Badcoe I, Cohen MA, Gerloff DL (1994) Bona fide prediction of aspects of protein conformation: assigning interior and surface residues from patterns of variation and conservation in homologous protein sequences. J Mol Biol 235:926-958
Bossart-Whitaker P, Chang CY, Novotny J, Benjamin DC, Sheriff S (1995) The crystal structure of the antibody N10-staphylococcal nuclease complex at 2.9 Å resolution. J Mol Biol 253:559-575
Chakrabarti P, Janin J (2002) Dissecting protein-protein recognition sites. Proteins 47:334-343
Dodge C, Schneider R, Sander C (1998) The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res 26:313-315
Eisenberg D, Schwarz E, Komaromy M, Wall R (1984) Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol 179:125-142
Fariselli P, Pazos F, Valencia A, Casadia R (2002) Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur J Biochem 269:1356-1361
Frigerio F, Coda A, Pugliese L, Lionetti C, Menegatti E, Amiconi G, Schnebli HP, Ascenzi P, Bolognesi M (1992) Crystal and molecular structure of the bovine alpha-chymotrypsin-eglin c complex at 2.0 A resolution. J Mol Biol 225:107-123
Gallet X, Charloteaux B, Thomas A, Brasseur R (2000) A fast method to predict protein interaction sites from sequences. J Mol Biol 302:917-926
Gallivan JP, Lester HA, Dougherty DA (1997) Site-specific incorporation of biotinylated amino acids to identify surface-exposed residues in integral membrane proteins. Chem Biol 4:739-749
Glaser F, Steinberg DM, Vakser A, Ben-Tal N (2001) Residue frequencies and pairing preferences at protein-protein interfaces. Proteins 43:89-102
Holbrook SR, Muskal SM, Kim SH (1990) Predicting surface exposure of amino acids from protein sequence. Protein Eng 3:659-665
Jones S,Thornton JM (1996) Principles of protein-protein interactions. P Natl Acad Sci USA, 93:13-20
Jones S, Thornton JM (1997a) Analysis of protein-protein interaction sites using surface patches. J Mol Biol 272:121-132
Jones S, Thornton JM (1997b) Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 272:133-143
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577-2637
Kini RM, Evans HJ (1996) Prediction of potential protein-protein interaction sites from amino acid sequence identification of a fibrin polymerization site. FEBS Lett 385:81-86
Lu L, Lu H, Skolnick J (2003) Development of Unified Statistical Potentials describing Protein-protein interactions. Biophy J 84:1895–1901
Mandler J (1988) ANTIGEN: protein surface residue prediction. Comput Appl Biosci 4:493
Mucchielli-Giorgi MH, About S, Puffery P (1999) PredAcc: prediction of solvent accessibility. Bioinformatics 15:176-177
Naderi-Manesh H, Sadeghi M, Arab S, Movahedi AAM (2001) Prediction of protein surface accessibility with information theory. Proteins 42:452-459
Ofran Y, Rost B (2003a) Analysing six types of protein-protein interfaces. J Mol Biol 325:377-387
Ofran Y, Rost B (2003b) Predicted protein-protein interaction sites from local sequence information. FEBS Lett 544:236-239
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods-support vector learning. MIT Press, Cambridge, pp 185-208
Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20:216-226
Teichmann SA, Murzin AG, Chothia C (2001) Determination of protein function, evolution and interactions by structural genomics. Curr Opin Struc Biol 11:354-363
Tsunemi M, Matsuura Y, Sakakibara S, Katsube Y(1996) Crystal structure of an elastase-specific inhibitor elafin complexed with porcine pancreatic elastase determined at 1.9 A resolution. Biochemistry 35:11570-11576
Valencia A, Pazos F (2002) Computational methods for prediction of protein interactions. Curr Opin Struc Biol 12:368-373
Vapnik V (1998) Statistical learning theory. Springer, Berlin Heidelberg New York
Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kauffman, San Mateo, California
Honavar V, Yan C, Dobbs D (2002) Predicting protein-protein interaction sites from amino acid sequence. Technical report ISU-CS-TR 02-11 (http://archives.cs.iastate.edu/documents/disk0/00/00/02/88/index.html). Department of Computer Science, Iowa State University
Zhou H, Shan Y (2001) Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44:336-343
Acknowledgements
This research was supported in part by grants from the National Science Foundation (0219699), the National Institute of Health (GM066387), and the Iowa State University Plant Science Institute.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yan, C., Honavar, V. & Dobbs, D. Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach. Neural Comput & Applic 13, 123–129 (2004). https://doi.org/10.1007/s00521-004-0414-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-004-0414-3