Skip to content

Commit f07ce29

Browse files
committed
Adding addition alignment algorithms & code examples
1 parent fc70659 commit f07ce29

File tree

1 file changed

+88
-4
lines changed

1 file changed

+88
-4
lines changed

structure/alignment.md

Lines changed: 88 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,38 @@ Protein Structure Alignment
33

44
## What is a structure alignment?
55

6-
A **Structural alignment** attempts to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. In contrast to simple structural superposition (see below), where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions.
6+
A **Structural alignment** attempts to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. In contrast to simple structural superposition (see below), where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions.
77

88
Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be exercised when using the results as evidence for shared evolutionary ancestry, because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.
99

1010
For more info see the Wikipedia article on [protein structure alignment](http://en.wikipedia.org/wiki/Structural_alignment).
1111

1212
## Alignment Algorithms supported by BioJava
1313

14-
BioJava comes with implementations of the Combinatorial Extension (CE) and FATCAT algorithms. Both algorithms come in two variations, as such one can say that BioJava supports the following four algorithms.
14+
BioJava comes with a number of algorithms for aligning structures. The following
15+
five options are displayed by default in the user interface, although others can
16+
be accessed programmatically using the methods in
17+
[StructureAlignmentFactory](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/StructureAlignmentFactory.html).
1518

1619
1. Combinatorial Extension (CE)
1720
2. Combinatorial Extension with Circular Permutation (CE-CP)
1821
3. FATCAT - rigid
1922
4. FATCAT - flexible.
23+
5. Smith-Waterman superposition
24+
25+
CE and FATCAT both use structural similarity to align the proteins, while
26+
Smith-Waterman performs a local sequence alignment and then displays the result
27+
in 3D. See below for descriptions of the algorithms.
2028

2129
## Alignment User Interface
2230

2331
Before going the details how to use the algorithms programmatically, let's take a look at the user interface that cames with the *biojava-structure-gui* module.
2432

2533
<pre>
2634
AlignmentGui.getInstance();
27-
</pre>
35+
</pre>
2836

29-
shows the following user interface.
37+
shows the following user interface.
3038

3139
![Alignment GUI](img/alignment_gui.png)
3240

@@ -59,6 +67,8 @@ decomposing the protein automatically using the [Protein Domain
5967
Parser](http://www.biojava.org/docs/api/org/biojava/bio/structure/domain/LocalProteinDomainParser.html)
6068
algorithm).
6169

70+
BioJava class: [org.biojava.bio.structure.align.ce.CeMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/CeMain.html)
71+
6272
### Combinatorial Extension with Circular Permutation (CE-CP)
6373

6474
CE and FATCAT both assume that aligned residues occur in the same order in both
@@ -82,6 +92,8 @@ proteins will be shown in different colors:
8292

8393
CE-CP was developed by Spencer E. Bliven, Philip E. Bourne, and Andreas Prli&#263;.
8494

95+
BioJava class: [org.biojava.bio.structure.align.ce.CeCPMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/CeCPMain.html)
96+
8597
### FATCAT - rigid
8698

8799
This is a Java implementation of the original FATCAT algorithm by [Yuzhen Ye
@@ -91,6 +103,8 @@ It performs similarly to CE for most proteins. The 'rigid' flavor uses a
91103
rigid-body superposition and only considers alignments with matching sequence
92104
order.
93105

106+
BioJava class: [org.biojava.bio.structure.align.fatcat.FatCatRigid](www.biojava.org/docs/api/org/biojava/bio/structure/align/fatcat/FatCatRigid.html)
107+
94108
### FATCAT - flexible
95109

96110
FATCAT-flexible introduces 'twists' between different parts of the proteins
@@ -104,6 +118,76 @@ this is that it can lead to additional false positives in unrelated structures.
104118
![(Left) Rigid and (Right) flexible alignments of
105119
calmodulin](img/1cfd_1cll_fatcat.png)
106120

121+
BioJava class: [org.biojava.bio.structure.align.fatcat.FatCatFlexible](www.biojava.org/docs/api/org/biojava/bio/structure/align/fatcat/FatCatFlexible.html)
122+
123+
### Smith-Waterman
124+
125+
This aligns residues based on Smith and Waterman's 1981 algorithm for local
126+
*sequence* alignment. No structural information is included in the alignment, so
127+
this only works for proteins with significant sequence similarity. It uses the
128+
Blosum65 scoring matrix.
129+
130+
The two structures are superimposed based on this alignment. Be aware that errors
131+
locating gaps can lead to high RMSD in the resulting superposition due to a
132+
small number of badly aligned residues. However, this method is faster than
133+
the structure-based methods.
134+
135+
BioJava Class: [org.biojava.bio.structure.align.ce.CeCPMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/CeCPMain.html)
136+
137+
### Other methods
138+
139+
The following methods are not presented in the user interface by default:
140+
141+
* [BioJavaStructureAlignment](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/BioJavaStructureAlignment.html)
142+
A structure-based alignment method able of returning multiple alternate
143+
alignments. It was writen by Andreas Prlic and based on the PSC++ algorithm
144+
provided by Peter Lackner.
145+
* [CeSideChainMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/CeSideChainMain.html)
146+
A variant of CE using CB-CB distances, which sometimes improves alignments in
147+
proteins with parallel sheets and helices.
148+
* [OptimalCECPMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/OptimalCECPMain.html)
149+
An alternate (much slower) algorithm for finding circular permutations.
150+
151+
Additional methods can be added by implementing the
152+
[StructureAlignment](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/StructureAlignment.html)
153+
interface.
154+
155+
156+
## Creating alignments programmatically
157+
158+
The various structure alignment algorithms in BioJava implement the
159+
`StructureAlignment` interface, and are normally accessed through
160+
`StructureAlignmentFactory`. Here's an example of how to create a CE-CP
161+
alignment and print some information about it.
162+
163+
```java
164+
String name1 = "3cna.A";
165+
String name2 = "2pel";
166+
167+
AtomCache cache = new AtomCache();
168+
169+
Atom[] ca1 = cache.getAtoms(name1);
170+
Atom[] ca2 = cache.getAtoms(name2);
171+
172+
StructureAlignment algorithm = StructureAlignmentFactory.getAlgorithm(CeCPMain.algorithmName);
173+
174+
AFPChain afpChain = algorithm.align(ca1,ca2);
175+
176+
// Print text output
177+
System.out.println(afpChain.toCE(ca1,ca2));
178+
```
179+
180+
To display the alignment using jMol, use:
181+
182+
```java
183+
// Or StructureAlignmentDisplay.display(afpChain, ca1, ca2);
184+
GuiWrapper.display(afpChain, ca1, ca2);
185+
```
186+
187+
Note that these require that you include the structure-gui package and the jmol
188+
binary in the classpath at runtime.
189+
190+
107191
## Acknowledgements
108192

109193
Thanks to P. Bourne, Yuzhen Ye and A. Godzik for granting permission to freely use and redistribute their algorithms.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy