Aligning Multiple Protein Structures Using
Aligning Multiple Protein Structures Using
vihten--klylvfeflhq-dlkkfmdasaltgiplpliksylfqllqglafchshrvlhrdlkpqnllintegaikladfgla
vlhsdk--kltlvfefcdq-dlkkyfdscn-gdldpeivksflfqllkglgfchsrnvlhrdlkpqnllinrngelklanfgla
afqthd--rlcfvmeyanggelffhlsrer--vfteerarfygaeivsaleylhsrdvvyrdiklenlmldkdghikitdfglc
vsrtdretkltlvfehvdq-dlttyldkvpepgvptetikdmmfqllrgldflhshrvvhrdlkpqnilvtssgqikladfgla
1GZ8 -----llBBBBBBBBBllllBBBBBBBll-lllBBBBBBBll-------l---HHHHHHHHHHlllllllBl---lBBB
1UNL -----llBBBBBBBBBllllBBBBBBBll-lllBBBBBBBBlll--llll-HHHHHHHHHHHHlllllllBl---lBBB
1O6L -llH-HHBBBBBBBBBlllBBBBBBBBll-lllBBBBBBBBHHHHHHlll--HHHHHHHHHHHHllllllBl---lBBB
1BLX lllHHHlBBBBBBBBBBllBBBBBBBBlllllBBBBBBBBBBBBlllllBllHHHHHHHHHHHHHlllllBllBBBBBB
BBBBll--BBBBBBBlllB-BHHHHHHHlllllllHHHHHHHHHHHHHHHHHHHHlllllllllHHHBBBlllllBBBllllHH
BBBlll--BBBBBBBlllB-BHHHHHHHll-llllHHHHHHHHHHHHHHHHHHHHllBBlllllHHHBBBlllllBBBllllll
BBBlll--BBBBBBBlllllBHHHHHHHHl--lllHHHHHHHHHHHHHHHHHHHHlllBlllllHHHBBBlllllBBBllllll
BBBlllBBBBBBBBBllll-BHHHHHHHlllllllHHHHHHHHHHHHHHHHHHHHlllllllllHHHBBBlllllBBBllllll
Figure 2: Multiple structure alignment of protein kinases. The top half displays the residues; the bottom half
displays the secondary structure elements – B for β-strand, H for α-helix, l for loop. The nucleotide binding site is
highlighted in blue, the ATP binding site is in yellow, and the proton acceptor site in green.
accessibility. The sequence component is unused. The majority of the secondary structure regions are
Because some key residues are conserved across all conserved across all structures; in all cases, msTALI
proteins being aligned, this provides a method for has aligned these regions together. Furthermore, for
validation of the algorithm. many of these conserved secondary structures of equal
length, msTALI has aligned the ends of the structures.
Where there is ambiguity about the precise alignment,
3.1 Protein S/R Kinases such as the additional α-helices present in 1O6L, the
We aligned four members of the protein kinase hydrophobicity and surface accessibility components
catalytic subunit family as indicated in SCOP: 1GZ8, provide valuable information in determining the exact
1UNL, 1O6L, and 1BLX. All are serine/theronine local alignment.
kinases from homo sapiens. The entire chain indicated The functional regions of the protein s/r
in SCOP was aligned, although the s/r domain is the kinases are annotated in UniProtKB [18]. These sites
first ~50% of each chain. Here we only display the are highlighted in Figure 2. These sites were all aligned
alignment of the s/r kinase domain for brevity. correctly by msTALI. Furthermore, these sites are all
These structures were simultaneously aligned scored highly by msTALI, indicating well-conserved
using msTALI. No sequence information was used to sites.
create the alignment. Because many of the structures
have a high sequence identity to other structures in the
3.2 Acyl Carrier Proteins
alignment, sequence information provides a valuable,
independent method for judging the quality of the The acyl carrier protein (ACP) family is
alignment. involved in fatty acid synthesis, linking intermediates
during synthesis via a thioester linkage. Here we have
The full alignment is displayed in Figure 2.
aligned a set of acyl carrier proteins from varying
The score is an indication of the degree of conservation
sources. Three are crystal structures: 2FAC, 1L0I, and
at a position, ranging from a low of 0 to a high of 9.
2EHS. Two are NMR structures: 2JQ4 and 1ACP. One,
2FAC tieervkkiigeqlgv--kqeevtnnasfvedlgadsldtvelvmaleeefdteipdeeaek--ittvqaaidyin---g-hq-
1L0I tieervkkiigeqlgv--kqeevtnnasfvedlgadsldtvelvmaleeefdteipdeeaek--mttvqaaidyin---g-hq-
2EHS -leervkeiiaeqlgv--ekekitpeakfvedlgadsldvvelimafeeefgieipdedaek--iqtvgdvinylk---e-k--
2JQ4 --natireilakfgqlptpvdtiadeadl-yaaglssfasvqlmlgieeafdiefpdnllnrksfasikaiedtvklildgkea
1ACP tieervkkiigeqlgv—kqeevtnnasfvedlgadsldtvelvmaleeefdteipdeeaek--ittvqaaidyin---g-hq-
AcpXL atfdkvadiiaetsei--dratitpeshtiddlgidsldfldivfaidkefgikiplekwtq---e-----vn-----------
2FAC lHHHHHHHHHHHHHll--lHHHlllllBllllllllHHHHHHHHHHHHHHllllllHHHHHl--llBHHHHHHHHH---H-Hl-
1L0I lHHHHHHHHHHHHHll--lHHHlllllBllllllllHHHHHHHHHHHHHHHlllllHHHHll--llBHHHHHHHHH---H-ll-
2EHS -HHHHHHHHHHHHHll--lHHHlllllBllllllllHHHHHHHHHHHHHHHlllllHHHHHl--llBHHHHHHHHH---H-H--
2JQ4 --HHHHHHHHHHlllllllHHHllllllH-HHHlllHHHHHHHHHHHHHHHlllllHHHHllHHHHlHHHHHHHHHHHHHlHHH
1ACP llHHHHHHHHHHHlll--llllllllllllllllllHHHHHHHHHHHHHHHlllllHHHHll--llllHHHHHHHH---H-Hl-
AcpXL lHHHHHHHHHHHHHll—lHHHlllllBllllllllHHHHHHHHHHHHHHHlllllHHHHll---l-----lB-----------
Score 458899888988888900878898898786888888998899998898988789988988880067666667866600060530
Figure 3: Secondary structure of the residues of acyl carrier proteins. H denotes an α-helix (of any type), B denotes
a β-bridge or β-sheet, and l denotes a turn, bend, or loop.
AcpXL, is computationally modeled using I-TASSER so its alignment is more difficult to evaluate. However,
[19]. These structures were simultaneously aligned the conserved secondary structure elements are aligned
using msTALI. well with respect to the crystal structure 2FAC. The
conserved secondary structure elements – the first,
The full alignment of all acyl carrier proteins
second, fourth, and fifth α-helices of 2JQ4, are aligned
structures is shown in Figure 3. The three crystal
to their corresponding helices from 2FAC. The
structures are aligned perfectly with respect to one
alignment of the third α-helix of 2JQ4 to loop regions
another. One NMR structure, 1ACP, has high sequence
from other structures is reasonable, given that the
similarity to the crystal structures. It is clearly aligned
surrounding helices are precisely aligned and only a
well from its sequence identity and secondary structure
single gap was inserted in this region. Finally, 2JQ4 has
similarity to the crystal structures. The second NMR
three α-helices at its C-terminus, whereas the other
structure, 2JQ4, has much lower sequence identity, and
structures only have one. TALI has aligned the second
Figure 4: Alignment of the crystal structure 2EHS, Figure 5: Alignment of the crystal structure 2EHS,
in green, the NMR structures 2JQ4 , in magenta, in green, to the computational structure AcpXL, in
and 1ACP, in silver. purple.
helix from 2JQ4 to the single helix from the other structural components required to perform particular
proteins. functions.
The computational structure, AcpXL, has a msTALI is currently implemented in
high sequence identity to the crystal structures, so the MATLAB®, a flexible development environment
alignment is easy to assess. In addition, the secondary centered around matrices. We plan to re-implement it in
structure elements are identical. From these items, we C++ for speed and general-purpose use, and also to
see that these structures are precisely aligned to their make a web version available. Furthermore, msTALI
crystal structure counterparts. The only questionable could be beneficial in a number of other tools from our
part is the tail of AcpXL, which appears to be shifted to lab, such as PDPA.
the right by six residues.
The (rigid) protein structures were aligned in
Acknowledgements
MolMol [20] using the major conserved region from
each protein, as indicated by msTALI. This region is Thanks to the Rothberg Fellows program in the
annotated in Figure 3. Department of Computer Science and Engineering at
the University of South Carolina for their support to
A full alignment of this set of structures
PGS.
provides valuable insight into this protein family. When
considering only the three crystal structures, the
structures are highly similar, with pairwise backbone
RMSDs ranging from 0.39 to 0.67. Incorporating the
NMR structures as well provides insight into the 5 Bibliography
regions of similarity and difference between the crystal [1] Berman HM, Westbrook J, Feng Z, Gilliland G,
and NMR structures. Figure 4 displays a partial region Bhat TN, Weissig H, Shindyalov IN & Bourne PE.
from this alignment, isolating the two NMR structures The Protein Data Bank. Nucleic Acids Res (2000)
and a single crystal structure. The α-helices on the right 28: pp. 235-242.
are aligned well, while the helical regions in the top left [2] Murzin AG, Brenner SE, Hubbard T & Chothia C.
show a clear divergence. These regions of divergence SCOP - A Structural Classification Of Proteins
could be highly dynamical regions or areas of varying Database For The Investigation Of Sequences And
functionality. Structures. Journal of Molecular Biology (1995)
Finally, displaying the multiple structure 247: pp. 536-540.
alignment between the crystal structure 2EHS and the [3] Holm L & Sander C. Protein structure comparison
computationally modeled structure AcpXL provides by alignment of distance matrices. J Mol Biol
valuable information on the conservation of core (1993) 233: pp. 123-138.
structural and functional residues in the modeled [4] Shindyalov IN & Bourne PE. Protein structure
structure. This is illustrated in Figure 5. alignment by incremental combinatorial extension
(CE) of the optimal path. Protein Engineering
(1998) 11: pp. 739-747.
4 Conclusion [5] Krissinel E & Henrick K. Secondary-structure
matching (SSM), a new tool for fast protein
We have presented an algorithm for aligning structure alignment in three dimensions. Acta
multiple protein structures. It is computationally Crystallographica Section D-Biological
efficient, with a computational complexity on the same Crystallography (2004) 60: pp. 2256-2268.
order as multiple sequence alignment with ClustalW. It [6] Ortiz AR, Strauss CEM & Olmea O. MAMMOTH
provides both pairwise and multiple structure (Matching molecular models obtained from
alignments using several germane biochemical and theory): An automated method for model
biophysical properties. comparison. Protein Science (2002) 11: pp. 2606-
An alignment between multiple protein 2621.
structures provides a wealth of information about a [7] Lupyan D, Leo-Macias A & Ortiz AR. A new
protein family. For example, aligning proteins that progressive-iterative algorithm for multiple
contain a common active site could provide a method to structure alignment. Bioinformatics (2005) 21: pp.
identify the key residues in the active site. It could also 3255-3263.
identify the structural elements required to position the [8] Shatsky M, Nussinov R & Wolfson HJ. A method
atoms in the active site correctly. Aligning structurally for simultaneous alignment of multiple protein
related proteins can also elucidate their structural structures. Proteins (2004) 56: pp. 143-156.
differences, which may provide insight into the [9] Guda C, Scheeff ED, Bourne PE & Shindyalov
IN. A new algorithm for the alignment of multiple
protein structures using Monte Carlo optimization.
Pac Symp Biocomput (2001) : pp. 275-286.
[10] Krissinel E & Henrick K. Multiple Alignment of
Protein Structures in Three Dimensions. Lecture
Notes in Computer Science (2005) 3695: pp. 67-
78.
[11] Miao X, Waddell PJ & Valafar H. TALI: local
alignment of protein structures using backbone
torsion angles. J Bioinform Comput Biol (2008) 6:
pp. 163-181.
[12] Needleman SB & Wunsch CD. A general method
applicable to the search for similarities in the
amino acid sequence of two proteins. J Mol Biol
(1970) 48: pp. 443-453.
[13] Ramachandran GN, Ramakrishnan C &
Sasisekharan V. Stereochemistry Of Polypeptide
Chain Configurations. Journal Of Molecular
Biology (1963) 7: p. 95-&.
[14] Kyte J & Doolittle RF. A simple method for
displaying the hydropathic character of a protein.
J Mol Biol (1982) 157: pp. 105-132.
[15] Kabsch W & Sander C. Dictionary of protein
secondary structure: pattern recognition of
hydrogen-bonded and geometrical features.
Biopolymers (1983) 22: pp. 2577-2637.
[16] Thompson JD, Higgins DG & Gibson TJ.
CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through
sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids
Res (1994) 22: pp. 4673-4680.
[17] Saitou N & Nei M. The neighbor-joining method:
a new method for reconstructing phylogenetic
trees. Mol Biol Evol (1987) 4: pp. 406-425.
[18] The UniProt Consortium. The universal protein
resource (UniProt). Nucleic Acids Res (2008) 36:
p. D190-5.
[19] Zhang Y. I-TASSER server for protein 3D
structure prediction. BMC Bioinformatics (2008)
9: p. 40.
[20] Koradi R, Billeter M & Wuthrich K.
MOLMOL: A program for display and analysis of
macromolecular structures. Journal of Molecular
Graphics (1996) 14: p. 51-55.