Skip to content

Commit 4e40f5e

Browse files
committed
Adding Command-line tools section
Also moved up the database search section & fixed some spellings.
1 parent 051135a commit 4e40f5e

File tree

1 file changed

+62
-43
lines changed

1 file changed

+62
-43
lines changed

structure/alignment.md

Lines changed: 62 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -28,25 +28,30 @@ in 3D. See below for descriptions of the algorithms.
2828

2929
## Alignment User Interface
3030

31-
Before going the details how to use the algorithms programmatically, let's take a look at the user interface that cames with the *biojava-structure-gui* module.
31+
Before going the details how to use the algorithms programmatically, let's take
32+
a look at the user interface that cames with the *biojava-structure-gui* module.
3233

33-
<pre>
34-
AlignmentGui.getInstance();
35-
</pre>
34+
```java
35+
AlignmentGui.getInstance();
36+
```
3637

37-
shows the following user interface.
38+
This code shows the following user interface:
3839

3940
![Alignment GUI](img/alignment_gui.png)
4041

41-
You can manually select protein chains, domains, or custom files to be aligned. Try to align 2hyn vs. 1zll. This will show the results in a graphical way, in 3D:
42+
You can manually select protein chains, domains, or custom files to be aligned.
43+
Try to align 2hyn vs. 1zll. This will show the results in a graphical way, in
44+
3D:
4245

4346
![3D Alignment of PDB IDs 2hyn and 1zll](img/2hyn_1zll.png)
4447

4548
and also a 2D display, that interacts with the 3D display
4649

4750
![2D Alignment of PDB IDs 2hyn and 1zll](img/alignmentpanel.png)
4851

49-
The functionality to perform and visualize these alignments can of course be used also from your own code. Let's first have a look at the alignment algorithms:
52+
The functionality to perform and visualize these alignments can of course be
53+
used also from your own code. Let's first have a look at the alignment
54+
algorithms.
5055

5156
## The Alignment Algorithms
5257

@@ -60,7 +65,7 @@ structure, and then combining those to try to align the most residues possible
6065
while keeping the overall RMSD of the superposition low.
6166

6267
CE is a rigid-body alignment algorithm, which means that the structures being
63-
compared are kept fixed during superpositon. In some cases it may be desirable
68+
compared are kept fixed during superposition. In some cases it may be desirable
6469
to break large proteins up into domains prior to aligning them (by manually
6570
inputing a subrange, using the [SCOP or CATH databases](externaldb.md), or by
6671
decomposing the protein automatically using the [Protein Domain
@@ -77,7 +82,7 @@ related by a circular permutation, the N-terminal part of one protein is related
7782
to the C-terminal part of the other, and vice versa. CE-CP allows circularly
7883
permuted proteins to be compared. For more information on circular
7984
permutations, see the
80-
[wikipedia](http://en.wikipedia.org/wiki/Circular_permutation_in_proteins) or
85+
[Wikipedia](http://en.wikipedia.org/wiki/Circular_permutation_in_proteins) or
8186
[Molecule of the
8287
Month](http://www.pdb.org/pdb/101/motm.do?momID=124&evtc=Suggest&evta=Moleculeof%20the%20Month&evtl=TopBar)
8388
articles.
@@ -140,7 +145,7 @@ The following methods are not presented in the user interface by default:
140145

141146
* [BioJavaStructureAlignment](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/BioJavaStructureAlignment.html)
142147
A structure-based alignment method able of returning multiple alternate
143-
alignments. It was writen by Andreas Prlic and based on the PSC++ algorithm
148+
alignments. It was written by Andreas Prli&#263; and based on the PSC++ algorithm
144149
provided by Peter Lackner.
145150
* [CeSideChainMain](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/ce/CeSideChainMain.html)
146151
A variant of CE using CB-CB distances, which sometimes improves alignments in
@@ -152,6 +157,40 @@ Additional methods can be added by implementing the
152157
[StructureAlignment](http://www.biojava.org/docs/api/org/biojava/bio/structure/align/StructureAlignment.html)
153158
interface.
154159

160+
## PDB-wide database searches
161+
162+
The Alignment GUI also provides functionality for PDB-wide structural searches.
163+
This systematically compares a structure against a non-redundant set of all
164+
other structures in the PDB at either a chain or a domain level. Representatives
165+
are selected using the RCSB's clustering of proteins with 40% sequence identity,
166+
as described
167+
[here](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp).
168+
Domains are selected using either SCOP (when available) or the
169+
ProteinDomainParser algorithm.
170+
171+
![Database Search GUI](img/database_search.png)
172+
173+
To perform a database search, select the 'Database Search' tab, then choose a
174+
query structure based on PDB ID, SCOP domain id, or from a custom file. The
175+
output directory will be used to store results. These consist of individual
176+
alignments in compressed XML format, as well as a tab-delimited file of
177+
similarity scores and statistics. The statistics are displayed in an interactive
178+
results table, which allows the alignments to be sorted. The 'Align' column
179+
allows individual alignments to be visualized with the alignment GUI.
180+
181+
![Database Search Results](img/database_search_results.png)
182+
183+
Be aware that this process can be very time consuming. Before
184+
starting a manual search, it is worth considering whether a pre-computed result
185+
may be available online, for instance for
186+
[FATCAT-rigid](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp)
187+
or [DALI](http://ekhidna.biocenter.helsinki.fi/dali/start). For custom files or
188+
specific domains, a few optimizations can reduce the time for a database search.
189+
Downloading PDB files is a considerable bottleneck. This can be solved by
190+
downloading all PDB files from the [FTP
191+
server](ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/) and setting
192+
the `PDB_DIR` environmental variable. This operation sped up the search from
193+
about 30 hours to less than 4 hours.
155194

156195

157196
## Creating alignments programmatically
@@ -186,45 +225,25 @@ GuiWrapper.display(afpChain, ca1, ca2);
186225
// Or StructureAlignmentDisplay.display(afpChain, ca1, ca2);
187226
```
188227

189-
Note that these require that you include the structure-gui package and the jmol
228+
Note that these require that you include the structure-gui package and the jMol
190229
binary in the classpath at runtime.
191230

192231
## Command-line tools
193232

194-
## PDB-wide database searches
195-
196-
The Alignment GUI also provides functionality for PDB-wide structural searches.
197-
This systematically compares a structure against a non-redundant set of all
198-
other structures in the PDB at either a chain or a domain level. Representatives
199-
are selected using the RCSB's clustering of proteins with 40% sequence identity,
200-
as described
201-
[here](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp).
202-
Domains are selected using either SCOP (when available) or the
203-
ProteinDomainParser algorithm.
204-
205-
![Database Search GUI](img/database_search.png)
206-
207-
To perform a database search, select the 'Database Search' tab, then choose a
208-
query structure based on PDB ID, SCOP domain id, or from a custom file. The
209-
output directory will be used to store results. These consist of individual
210-
alignments in compressed XML format, as well as a tab-delimited file of
211-
similarity scores and statistics. The statistics are displayed in an interactive
212-
results table, which allows the alignments to be sorted. The 'Align' column
213-
allows individual alignments to be visualized with the alignment GUI.
233+
Many of the alignment algorithms are available in the form of command line
234+
tools. These can be accessed through the main methods of the StructureAlignment
235+
classes. Tar bundles are also available with scripts for running
236+
[CE and FATCAT](http://source.rcsb.org/jfatcatserver/download.jsp).
214237

215-
![Database Search Results](img/database_search_results.png)
238+
Example:
239+
```bash
240+
runCE.sh -pdb1 4hhb.A -pdb2 4hhb.B -show3d
241+
```
216242

217-
Be aware that this process can be very time consuming. Before
218-
starting a manual search, it is worth considering whether a pre-computed result
219-
may be available online, for instance for
220-
[FATCAT-rigid](http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp)
221-
or [DALI](http://ekhidna.biocenter.helsinki.fi/dali/start). For custom files or
222-
specific domains, a few optimizations can reduce the time for a database search.
223-
Downloading PDB files is a considerable bottleneck. This can be solved by
224-
downloading all PDB files from the [FTP
225-
server](ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/) and setting
226-
the `PDB_DIR` environmental variable. This operation sped up the search from
227-
about 30 hours to less than 4 hours.
243+
Using the command line tool it is possible to run pairwise alignments, several
244+
alignments in batch mode, or full database searches. Some additional parameters
245+
are available which are not exposed in the GUI, such as outputting results to a
246+
file in various formats.
228247

229248

230249
## Acknowledgements

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy