@@ -12,8 +12,8 @@ For more info see the Wikipedia article on [protein structure alignment](http://
12
12
## Alignment Algorithms supported by BioJava
13
13
14
14
BioJava comes with a number of algorithms for aligning structures. The following
15
- five options are displayed by default in the user interface, although others can
16
- be accessed programmatically using the methods in
15
+ five options are displayed by default in the graphical user interface (GUI),
16
+ although others can be accessed programmatically using the methods in
17
17
[ StructureAlignmentFactory] ( http://www.biojava.org/docs/api/org/biojava/bio/structure/align/StructureAlignmentFactory.html ) .
18
18
19
19
1 . Combinatorial Extension (CE)
@@ -153,6 +153,7 @@ Additional methods can be added by implementing the
153
153
interface.
154
154
155
155
156
+
156
157
## Creating alignments programmatically
157
158
158
159
The various structure alignment algorithms in BioJava implement the
@@ -161,16 +162,17 @@ The various structure alignment algorithms in BioJava implement the
161
162
alignment and print some information about it.
162
163
163
164
``` java
165
+ // Fetch CA atoms for the structures to be aligned
164
166
String name1 = " 3cna.A" ;
165
167
String name2 = " 2pel" ;
166
-
167
168
AtomCache cache = new AtomCache ();
168
-
169
169
Atom [] ca1 = cache. getAtoms(name1);
170
170
Atom [] ca2 = cache. getAtoms(name2);
171
171
172
+ // Get StructureAlignment instance
172
173
StructureAlignment algorithm = StructureAlignmentFactory . getAlgorithm(CeCPMain . algorithmName);
173
174
175
+ // Perform the alignment
174
176
AFPChain afpChain = algorithm. align(ca1,ca2);
175
177
176
178
// Print text output
@@ -180,13 +182,50 @@ System.out.println(afpChain.toCE(ca1,ca2));
180
182
To display the alignment using jMol, use:
181
183
182
184
``` java
183
- // Or StructureAlignmentDisplay.display(afpChain, ca1, ca2);
184
185
GuiWrapper . display(afpChain, ca1, ca2);
186
+ // Or StructureAlignmentDisplay.display(afpChain, ca1, ca2);
185
187
```
186
188
187
189
Note that these require that you include the structure-gui package and the jmol
188
190
binary in the classpath at runtime.
189
191
192
+ ## Command-line tools
193
+
194
+ ## PDB-wide database searches
195
+
196
+ The Alignment GUI also provides functionality for PDB-wide structural searches.
197
+ This systematically compares a structure against a non-redundant set of all
198
+ other structures in the PDB at either a chain or a domain level. Representatives
199
+ are selected using the RCSB's clustering of proteins with 40% sequence identity,
200
+ as described
201
+ [ here] ( http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp ) .
202
+ Domains are selected using either SCOP (when available) or the
203
+ ProteinDomainParser algorithm.
204
+
205
+ ![ Database Search GUI] ( img/database_search.png )
206
+
207
+ To perform a database search, select the 'Database Search' tab, then choose a
208
+ query structure based on PDB ID, SCOP domain id, or from a custom file. The
209
+ output directory will be used to store results. These consist of individual
210
+ alignments in compressed XML format, as well as a tab-delimited file of
211
+ similarity scores and statistics. The statistics are displayed in an interactive
212
+ results table, which allows the alignments to be sorted. The 'Align' column
213
+ allows individual alignments to be visualized with the alignment GUI.
214
+
215
+ ![ Database Search Results] ( img/database_search_results.png )
216
+
217
+ Be aware that this process can be very time consuming. Before
218
+ starting a manual search, it is worth considering whether a pre-computed result
219
+ may be available online, for instance for
220
+ [ FATCAT-rigid] ( http://www.rcsb.org/pdb/static.do?p=general_information/cluster/structureAll.jsp )
221
+ or [ DALI] ( http://ekhidna.biocenter.helsinki.fi/dali/start ) . For custom files or
222
+ specific domains, a few optimizations can reduce the time for a database search.
223
+ Downloading PDB files is a considerable bottleneck. This can be solved by
224
+ downloading all PDB files from the [ FTP
225
+ server] ( ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/ ) and setting
226
+ the ` PDB_DIR ` environmental variable. This operation sped up the search from
227
+ about 30 hours to less than 4 hours.
228
+
190
229
191
230
## Acknowledgements
192
231
0 commit comments