Eramian ProteinSci 2008
Eramian ProteinSci 2008
org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Protein Sci. 2008 17: 1881-1893; originally published online Oct 1, 2008;
Access the most recent version at doi:10.1110/ps.036061.108
References This article cites 68 articles, 31 of which can be accessed free at:
http://www.proteinscience.org/cgi/content/full/17/11/1881#References
Email alerting Receive free email alerts when new articles cite this article - sign up in the box at the
service top right corner of the article or click here
Notes
Abstract
Comparative structure models are available for two orders of magnitude more protein sequences than
are experimentally determined structures. These models, however, suffer from two limitations that ex-
perimentally determined structures do not: They frequently contain significant errors, and their accuracy
cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized
specifically for predicting the Ca root-mean-squared deviation (RMSD) and native overlap (NO3.5Å)
errors of a model in the absence of its native structure. In contrast to most traditional assessment scores
that merely predict one model is more accurate than others, this approach quantifies the error in an ab-
solute sense, thus helping to determine whether or not the model is suitable for intended applications.
The assessment relies on a model-specific scoring function constructed by a support vector machine.
This regression optimizes the weights of up to nine features, including various sequence similarity mea-
sures and statistical potentials, extracted from a tailored training set of models unique to the model being
assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized
models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5Å
errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r)
of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation
compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.
The explosive growth of sequence databases has not been absence of an experimentally determined structure, such
accompanied by commensurate growth of the protein computational models are often valuable for generating
structure database, the Protein Data Bank (PDB) (Berman testable hypotheses and giving insight into existing ex-
et al. 2000). Of the millions of known protein sequences, perimental data (Baker and Sali 2001).
well fewer than 1% of their corresponding structures have Computationally derived structure models, however,
been solved experimentally. Computationally derived generally suffer two major limitations that can limit their
structure models serve to bridge this gap, owing to the utility: They frequently contain significant errors, and
prediction of two orders of magnitude more structures their accuracy cannot be readily assessed. Indeed, even if
than are currently available (Pieper et al. 2006). In the a method sometimes produces accurate solutions, the av-
erage precision is still low (Baker and Sali 2001; Bradley
et al. 2005). There is currently no practical way to easily
Reprint requests to: Andrej Sali, University of California at San and robustly assess the accuracy of a predicted structure,
Francisco, Byers Hall, Suite 503B, 1700 4th Street, San Francisco, CA which is problematic for the end users of the models, who
94158, USA; e-mail: sali@salilab.org; fax: (415) 514-4231.
Article published online ahead of print. Article and publication date cannot be certain that a model is accurate enough in the
are at http://www.proteinscience.org/cgi/doi/10.1110/ps.036061.108. region(s) of interest to give meaningful biological insight.
Protein Science (2008), 17:1881–1893. Published by Cold Spring Harbor Laboratory Press. Copyright Ó 2008 The Protein Society 1881
Downloaded from www.proteinscience.org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Eramian et al.
It is often only after performing time-consuming experi- among the many models produced), the user does not
ments that a model’s accuracy is determined reliably. necessarily have any sense of the absolute accuracy of the
Comparative modeling is the most widely used and selected model. Although the selected model might be
generally most accurate class of protein structure pre- more accurate than the others produced, is it accurate
diction approaches (Marti-Renom et al. 2000; Tramontano enough? For example, is the best model expected to have
et al. 2001; Eswar et al. 2007). The accuracy of a compara- a Ca root-mean-square deviation (RMSD) of 2.0 Å, or
tive model is weakly correlated with the sequence identity 9.0 Å? Nearly all traditional assessment scores do not ad-
shared between the target sequence and the template dress these questions, often reporting scores in pseudo-
structure(s) used in the modeling procedure (Sanchez energy units or arbitrary values that correlate poorly with
et al. 2000). At high sequence identity ranges (i.e., over accuracy measures such as RMSD. Here, we use the phrase
50% sequence identity), comparative models can be accu- ‘‘absolute accuracy’’ to mean the actual geometrical accu-
rate enough to be useful in virtual ligand screening or for racy, such as RMSD and MaxSub (Siew et al. 2000), which
inferring the catalytic mechanism of an enzyme (Bjelic and could be calculated if the true native structure were known.
Aqvist 2004; Caffrey et al. 2005; Chmiel et al. 2005; In the absence of the native structure, the absolute accuracy
Costache et al. 2005; Xu et al. 2005). At lower values of is not known and must be predicted.
sequence identity, especially below 30%, alignment errors Predicting the absolute accuracy of a model is partic-
and differences between the target and template structures ularly difficult due to the lack of principled reasons why
can become major sources of errors (Chothia and Lesk an individual assessment score should correlate well with
1986; Rost 1999; Sauder et al. 2000; Jaroszewski et al. accuracy measures such as RMSD, particularly if the
2002; Ginalski et al. 2005; Madhusudhan et al. 2006; Rai models are not native-like (Fiser et al. 2000). Attempts to
and Fiser 2006). In automated comparative modeling of all predict absolute accuracy have included methods based
known protein sequences related to at least one known on neural networks (Wallner and Elofsson 2003), support
structure, 76% of all models are from alignments in which vector machines (SVMs) (Eramian et al. 2006), and multi-
the target and template share less than 30% sequence variate regression (Tondel 2004). While such approaches
identity (Pieper et al. 2006), where the corresponding can perform well for small families or are able to select
models can have a wide range of accuracies (Sanchez the most native-like model in a set of decoys that does not
et al. 2000; Chakravarty and Sanchez 2004). contain the native structure, to our knowledge no ap-
Because of the wide accuracy range of models, many proach has demonstrated a clear ability to predict the
assessment scores have been developed for tasks includ- absolute accuracy of a large, diverse set of models repre-
ing (1) determining whether or not a model has the cor- sentative of real-world use cases.
rect fold (Miyazawa and Jernigan 1996; Park and Levitt Here, we describe a protocol for predicting absolute
1996; Domingues et al. 1999; Lazaridis and Karplus accuracy by which a model-specific scoring function is
1999; Gatchell et al. 2000; Melo et al. 2002; McGuffin developed using SVM regression. For an input compara-
and Jones 2003; Melo and Sali 2007), (2) discriminating tive model, a unique training set is created from an ex-
between the native and near-native states (Sippl 1993; tremely large database of models of known accuracy (i.e.,
Melo and Feytmans 1997, 1998; Park et al. 1997; Fiser their native structures are known and their accuracies can
et al. 2000; Lazaridis and Karplus 2000; Zhou and Zhou thus be calculated). Two predictions are made from this
2002; Seok et al. 2003; Tsai et al. 2003; Shen and Sali training set for each query structure model: (1) the RMSD
2006), (3) selecting the most native-like model in a set of of the model and (2) the native overlap (NO3.5Å), where
decoys that does not contain the native structure (Shortle native overlap is defined as the fraction of Ca atoms in a
et al. 1998; Wallner and Elofsson 2003; Eramian et al. model that are within 3.5 Å of the corresponding atoms in
2006; Qiu et al. 2007), and (4) predicting the accuracy of the native structure after rigid body superposition of the
a model in the absence of the native structure (Wallner model to the native structure (Sanchez et al. 2000). By
and Elofsson 2003, 2006; Eramian et al. 2006; McGuffin creating a model-specific tailored training set consisting
2007). Despite the large body of work devoted to the first only of models structurally similar to the assessed model,
three tasks, however, relatively little work has been de- we gain the ability to predict RMSD and NO3.5Å with a
voted to the last task, predicting the absolute accuracy of high correlation to the actual RMSD and NO3.5Å values
computational models. Due to the enormity of the con- for a diverse set of 580,317 comparative models (r ¼ 0.84
formational search problem, prediction methods often and 0.86, respectively).
produce a large number of models and use a score or We begin by describing the performance of our score at
scores to predict which are most accurate: These ap- predicting absolute accuracy (Results). We then discuss
proaches determine the relative accuracy of models. the implications and application of our approach for
However, even if the selection score worked perfectly large-scale computational prediction efforts (Discussion).
(i.e., was able to identify the most accurate model from Finally, we describe the test set and testing database, the
metrics used to evaluate accuracy, and the process for found in the PDB (Berman et al. 2000; Shen et al. 2005),
developing the score (Methods). 78% of the models (455,347) were smaller than this size,
reflecting that local, rather than global, alignments were
used for modeling (Methods).
Results
The accuracy distribution of the models was broad
(Fig. 1C,D). The median RMSD value of the set was
Test set properties
7.0 Å, and the median NO3.5Å value was 0.46. Only 6%
An extremely large test set of 580,317 comparative (36,063 models) had RMSD values <2.0 Å, a low number
models from 6174 sequences was constructed to test our resulting from the filtering performed prior to construc-
protocol. The properties of the set mirror those observed tion of the test set, as well as the inability of the
in large-scale protein structure prediction efforts (Pieper comparative modeling protocol to consistently produce
et al. 2006). Most models (461,202 models; 80%) were models more native-like than the template structure.
from alignments in which the sequence identity between
the target and template was under 30%, and 94%
Correlations between actual model accuracy
(541,238 models) had less than 40% sequence identity and assessment scores
(Fig. 1A). The median length of the input sequences was
181 residues, and the median model size was 111 residues Correlation coefficients were calculated between the nine
(Fig. 1B). Though the median sequence length was longer input features and the three geometric accuracy metrics
than the ;156-residue average size of protein domains (RMSD, NO3.5Å, and MaxSub). The accuracy of models
Figure 1. Properties of the 580,317 model testing set (A–D). The y-axis on the left indicates the number of models that fall into the corresponding bin indicated
by the x-axis. The line and right y-axis correspond to the cumulative percentage of total models having the appropriate feature. (A) The global sequence identity
shared between target/template alignments of the test set. Approximately 80% of the models are from alignments in which the target and template share less
than 30% sequence identity. (B) The length distribution of models in the test set (median ¼ 111 amino acids). (C) The Ca RMSD distribution of the models,
with a bin size equal to 2.0 Å (median ¼ 7.0 Å). (D) The native overlap distribution, calculated using a cutoff of 3.5 Å (median ¼ 0.46).
www.proteinscience.org 1883
Downloaded from www.proteinscience.org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Eramian et al.
varied widely as sequence identity decreased (Fig. 2A), the median absolute difference between the actual and
making sequence identity a relatively uninformative predicted NO3.5Å values was only 0.07, with first and third
measure for estimating model accuracy. The Pearson quartile values of 0.03 and 0.16, respectively. The split
correlation coefficient (r) between sequence identity between predictions that were higher and lower than the
and native overlap was only 0.54 (Table 1). actual values was 56% and 44%, respectively. The correla-
Of the nine features used for SVM training, N-DOPE had tion between actual and predicted RMSD was 0.84, display-
the highest correlation coefficient with RMSD, NO3.5Å, ing great linearity even out to high RMSD values (Fig. 2D).
and MaxSub (Table 1). N-DOPE was particularly well suited The median absolute difference between the actual and
for identifying near-native (N-DOPE scores below 1.5), or predicted RMSD values was 1.3 Å for all 580,317 models.
inaccurate (scores above 1.0) models. However, a majority of Considering only those models below 5.0 Å RMSD, the
models (80%) had N-DOPE scores between 1.5 and 1.0, median absolute difference between the actual and predicted
where N-DOPE was not strongly correlated with NO3.5Å values was only 0.71 Å. RMSD predictions were also closely
(Fig. 2B) or MaxSub (Table 1). For example, the first and split between those that were higher (48%) and lower (52%)
third quartile NO3.5Å values for models with N-DOPE values than the actual values.
of ;0.0 were 0.15 and 0.64, respectively, giving a wide range ProQ and ModFOLD were used to compare the
around the median NO3.5Å value of 0.43. The correlation performance of the model-specific scoring approach to
coefficient between N-DOPE and NO3.5Å was 0.71. approaches that do not benefit from learning from a
In contrast, the correlation between the actual and specific training set. The correlation between the actual
predicted native overlap was 0.86 (Fig. 2C). Furthermore, and predicted MaxSub scores was highest for ProQ-SS,
Figure 2. The relationships between the actual NO3.5Å and sequence identity (A; r ¼ 0.54); the normalized DOPE score (B; r ¼ 0.71); and the predicted
native overlap (C; r ¼ 0.86). In each plot, the diameter of a bubble represents the number of examples contained in the 2D bin indicated by the x- and
y-axes. The bubble size is comparable between the different plots. Additionally, the median value for each bin is depicted by the solid line, where the upper
and lower error bars indicate the third and first quartile values, respectively. (D) The relationship between the predicted and actual RMSD (r ¼ 0.84).
Table 1. The correlation coefficients (r) between the actual under the ROC curve (Methods) for sequence identity,
model accuracy and assessment scores on the full 580,317 model GA341, N-DOPE, and the predicted NO3.5Å were 0.80,
testing set 0.86, 0.87, and 0.93, respectively (Fig. 3). With a NO3.5Å
Native overlap threshold of 0.50, these values were 0.81, 0.84, 0.86, and
RMSD (3.5 Å) MaxSub 0.93, respectively (data not shown). Thus, using the pre-
dicted NO3.5Å value to classify whether or not a model has
Actual RMSD 1 0.78 0.78
Actual NO3.5Å 0.78 1 0.9 the correct fold was significantly more accurate than the
Actual MaxSub 0.72 0.9 1 other fold assessment scores tested.
Predicted RMSD 0.84 0.75 0.74
Predicted NO3.5Å 0.73 0.86 0.83 Residue neighborhood accuracy
N-DOPE 0.64 0.71 0.73
Sequence identity 0.43 0.54 0.54 Two structure-derived properties, the solvent exposure
Z-PAIR 0.37 0.51 0.55 state and the residue neighborhood (Chakravarty and
Z-SURFACE 0.4 0.51 0.55
Z-COMBINED 0.41 0.55 0.59
Sanchez 2004), were calculated for 25,000 models of
GA341 0.54 0.67 0.69 100–200 residues randomly selected from the test set. The
Percentage unaligned accuracy of a residue’s neighborhood was calculated
residues 0.35 0.41 0.37 by comparing the contacts made by a residue with its
PSIPred agreement 0.51 0.58 0.63 neighbors in the model, versus those made by that residue
PSIPred weighted 0.43 0.51 0.55
ProQ predicted LGScore 0.35 0.49 0.5
in the native structure, thereby measuring the percentage
ProQ predicted of contacts that are accurately modeled. There was a clear
MaxSub 0.5 0.63 0.65 decrease in the median neighborhood accuracy (Fig. 4A)
ProQ (SS) predicted for models constructed from target/template pairs sharing
LGScore 0.44 0.58 0.6 less than 40% sequence identity (96% of the 25,000 models),
ProQ (SS) predicted
MaxSub 0.57 0.71 0.72
with an overall correlation of r ¼ 0.57. In contrast, the
ProQres (SS) 0.41 0.56 0.58 neighborhood accuracy was more correlated with the pre-
dicted native overlap value (r ¼ 0.82) (Fig. 4B), with much
tighter first and third quartile error bars.
at 0.72 (Table 1), significantly less accurate than the
correlation between the predicted NO3.5 Å and MaxSub Residue exposure state
of 0.83 given by our protocol. Thus, even though our The second assessed structure-derived property was the
protocol was designed to predict NO3.5Å and not Max- exposure state of a residue. A residue was defined as
Sub, the resulting predictions were much better correlated
with the actual MaxSub scores than the ProQ predictions
that were designed specifically for this task.
ModFOLD was run on 36,453 randomly selected models
for 225 sequences from our test set. The correlation
coefficient between the ModFOLD score and RMSD for
these 36,453 models was 0.51, and the correlation coef-
ficient between NO3.5Å and the ModFOLD score was 0.63
(see, Table 3). In comparison, TSVMod’s correlations were
0.85 and 0.88, respectively, which is essentially identical to
the values obtained for our full test set.
Fold assessment
Fold assessment is a particularly important problem at lower
values of sequence identity, when it is possible that the
template used to construct a model does not have the same
fold. As expected, the sequence identity of the target/
template pair used to construct the model was only margin-
ally useful for assessing whether the model had the correct Figure 3. Receiver operating characteristic (ROC) curves for fourfold
assessment classifiers: the predicted native overlap (solid black line); the
fold. The GA341 and N-DOPE scores, both developed for
normalized DOPE score (dashed black line); the GA341 score (solid gray
fold assessment, were better at classifying models correctly. line); and the sequence identity shared between the target and the template
By use of 0.30 as the NO3.5Å threshold for defining whether (dashed gray line). For each measure, the area under the curve is noted. A
or not a model has the correct fold, the calculated areas model was defined as having the correct fold if NO3.5Å $ 0.30.
www.proteinscience.org 1885
Downloaded from www.proteinscience.org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Eramian et al.
Figure 4. Relationship between structure-derived properties and the predicted accuracy for 25,000 randomly selected models of length 100–200 amino
acids. (A) The percentage of correct neighborhood (solid line) is plotted versus the sequence identity shared between the target and the template used to
construct the model (r ¼ 0.57). The solid line indicates the median value for the bin; the upper and lower error bars indicate the third and first quartile
values for the bin, respectively. The columns indicate the fraction of examples that are contained in each bin (right y-axis). (B) Relationship between the
predicted native overlap and the neighborhood accuracy of a model (r ¼ 0.82). (C) The percentage of exposed residues correctly modeled as exposed versus
sequence identity (r ¼ 0.56). (D) Percentage of exposed residues correctly modeled as exposed versus predicted NO3.5Å (r ¼ 0.65).
exposed if it had relative surface accessibility larger than sion was used to combine up to nine features (sequence
40%, using the method of Lee and Richards (1971) identity, N-DOPE, Z-PAIR, Z-SURFACE, Z-COMBINED,
calculated by Naccess v2.1.1 (Hubbard et al. 1991). percentage of gaps in the target/template alignment,
Exposure state accuracy was observed to be higher than GA341, and two PSIPRED/DSSP scores) extracted from
neighborhood accuracy; with less decrease in accuracy as a tailored training set unique for the query structure model
sequence identity fell below 40% (Fig. 4C). The overall being assessed. This protocol is able to predict the RMSD
correlation between the sequence identity and exposure and NO3.5Å values for a large, diverse set of comparative
state accuracy was 0.56, well below that between the models with correlation coefficients of 0.84 and 0.86, respec-
predicted native overlap and correct exposure state (r ¼ tively, to the actual RMSD and NO3.5Å values (Table 1).
0.65) (Fig. 4D). The test set used for this study consisted of 580,317
models, for 6174 sequences. This set is approximately an
order of magnitude larger, and contains one to two orders
Discussion
of magnitude more sequences, than typical model assess-
A limitation of comparative models is that their accuracy ment test sets (Samudrala and Levitt 2000; Tsai et al.
cannot be readily and robustly assessed. We have ad- 2003; Wallner and Elofsson 2003; Eramian et al. 2006).
dressed this problem by developing a protocol for deriv- The properties of this set parallel those seen in large-scale
ing SVM regression models optimized specifically for comparative modeling, and the models span virtually all
predicting the actual RMSD and NO3.5Å values of a SCOP (Andreeva et al. 2004) fold types (Table 2), protein
model in the absence of its native structure. SVM regres- sizes (Fig. 1B), and accuracies (Fig. 1C,D). There are two
Table 2. The correlation coefficients (r) between the actual be an extremely accurate model assessment score in a
model accuracy and assessment scores for different SCOP fold number of studies (Colubri et al. 2006; Eramian et al.
classes 2006; Shen and Sali 2006; Fitzgerald et al. 2007; Marko
No. of No. of et al. 2007; Lu et al. 2008), yet correlates poorly with
sequences models RMSD NO3.5Å accuracy measures such as RMSD and NO3.5Å when
tested on our large test set (Fig. 2B). Attempts have been
Entire set 6174 573,977 0.84 0.86
NMR template 3124 90,257 0.76 0.74 made to predict absolute accuracy by combining a
X-ray template 6166 483,720 0.84 0.87 number of assessment scores (Wallner and Elofsson
SCOP all 3589 326,314 0.84 0.86 2003; Eramian et al. 2006; McGuffin 2007). Even in
All a (SCOP A) 795 52,905 0.85 0.86 these studies, however, the reported correlation coeffi-
All b (B) 839 142,614 0.83 0.85
cients between the predicted and actual accuracy mea-
a/b (C) 867 78,056 0.83 0.86
a + b (D) 878 62,402 0.87 0.88 sures was low, ranging from 0.35 to 0.71; moreover, these
Multi-domain (E) 54 2615 0.88 0.91 results were obtained on much smaller and less diverse
Membrane/cell test sets than the set employed here. For example, when
surface (F) 76 2760 0.57 0.76 we tested the ProQ method on the 580,317 models of our
Small (G) 298 19,039 0.82 0.84
test set, the correlation between the actual and predicted
Coiled coil (H) 83 3538 0.65 0.9
Low resolution (I) 10 294 0.61 0.76 MaxSub was only 0.72 (Table 1). Not only was ProQ’s
Peptides (J) 64 752 0.48 0.52 correlation with MaxSub slightly lower than that of N-
Designed (K) 17 746 0.76 0.84 DOPE (r ¼ 0.73), but it was far lower than the correlation
between the predicted NO3.5Å and actual MaxSub
obtained by the model-specific approach (r ¼ 0.83), even
though the MaxSub score was not predicted by the model-
features of this test set that reflect the difference between specific protocol. Had MaxSub been predicted in place of
our goal of predicting the absolute accuracy of compara- NO3.5Å, the performance gap between ProQ and the
tive models and traditional model assessment tests. First, model-specific approach could only be larger. Similarly,
the set contains no native structures, only models. A the correlation between the accuracy measures and the
common relative accuracy test is that the native structure ModFOLD scores were far lower than those between the
scores lower than all other models (Gatchell et al. 2000). actual and predicted values from TSVMod (Table 3).
While it is certainly a necessary condition that the native These results illustrate the utility of constructing a
state is separable from decoys, it is far from sufficient, scoring function specific for the input model. A unique
particularly in real use cases, where the best model feature of our approach is the optimization of the weights
produced is often far from native. Second, only one of the individual scores specifically for the fold and size
model, rather than many, is built per alignment, because of the model being assessed, rather than for a variety of
the goal was not to determine the ability of assessment proteins of many shapes and sizes. The difference
scores to identify the best model from among sets of between our approach and other composite scores is
similar models. Such relative accuracy assessments are analogous to the difference between position-specific
important because they more closely replicate the real- scoring matrices (PSSMs) and generalized substitution
world conditions in which assessment scores are used. matrices (e.g., BLOSUM62) employed in alignment
However, such tests overlook that even if the model algorithms. The use of a tailored training set for optimiz-
assessment scores are able to both correctly identify the ing the weights of input features is crucial, as different
native structure from among a set of decoys and identify assessment scores perform better for different sizes and
the most accurate model from among a set of similar
models, an end-user still has little information about how
accurate the model actually is. Table 3. The correlation coefficients (r) between the actual
model accuracy and assessment scores on a 36,453 model
The prediction of absolute accuracy is a difficult
testing set
problem that has not been given great attention. It has
been argued that there are no principled reasons why an Native overlap
RMSD (3.5 Å) MaxSub
individual assessment score should correlate with an
accuracy metric, particularly if the model is not native- Predicted RMSD 0.85 0.78 0.76
like (Fiser et al. 2000). Our own data support this Predicted NO3.5Å 0.76 0.88 0.85
contention, as all of the individual statistical potentials N-DOPE 0.69 0.73 0.75
ModFOLD 0.51 0.63 0.61
tested were relatively ill-suited for predicting absolute
ProQ (SS) predicted
accuracy (Table 1). The DOPE (Discrete Optimized MaxSub 0.56 0.67 0.67
Protein Energy) score, for example, has been shown to
www.proteinscience.org 1887
Downloaded from www.proteinscience.org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Eramian et al.
shapes of proteins, and their contributions to the overall errors. The reason is that the accuracy of such models can
composite score need to be adjusted accordingly. Our vary widely (Figs. 2A, 3; Table 1) and there has been no
results show the SVM algorithm can find appropriate practical way to robustly and reliably predict the absolute
weights for the features: First, optimally combining the accuracy of these models. As a result, the utility of
features results in scores that correlate well with RMSD comparative modeling is significantly reduced, because
and NO3.5Å (Table 1), although the overall correlation 76% of all comparative models are based on less than
coefficients of each of the input features is low; second, 30% sequence identity (Pieper et al. 2006). Our model-
there is a linear relationship between the actual and specific approach would result in a dramatic increase in
predicted accuracy (Fig. 2C,D). Most importantly, not the number of comparative models correctly assessed as
only do the predictions correlate well with the actual useful by helping identify those that are accurate (Fig.
accuracy of the models, but also the difference between 2A,B) and filtering incorrect models from consideration
the actual and predicted values is small (Fig. 2C,D). (Fig. 3). Of the 580,317 models in the test set, 485,066
The primary advantage of our approach relative to most models (84%) were from alignments with less than 30%
model assessment scores is that the prediction of absolute sequence identity; 173,139 of these models had actual
accuracy gives users confidence in the use of models for RMSD values below 5.0Å, and were predicted as such
their experiments. For comparative models, the sequence (36% of the 580,317 total models). Figure 5 shows two
identity shared between the target sequence and template of the many examples where target/template alignments
structure has historically been used to estimate the shared under 12.5% sequence identity, but the models
accuracy of models, as it is easy to calculate and produced were accurate and assessed as such by the
appreciate. Sequence identity, however, is a relatively predicted RMSD and NO3.5Å scores.
poor predictor of model accuracy, especially below 40% Similarly, our protocol is able to identify which models
(Figs. 2A, 3; Table 1), and actually adds little to the are suitable for many common experiments. Relative to
performance of TSVMod: Omitting sequence identity as purely geometric metrics like RMSD and NO3.5Å, the
a feature does not change the correlation coefficient two structure-derived properties calculated here—the
between the actual and predicted RMSD, while the neighborhood residue accuracy and the percentage of
correlation coefficient between the actual and predicted residues correctly modeled as exposed—are informative
NO3.5Å is reduced to 0.85 (from 0.86). Given these about the utility of a model for specific tasks such as
limitations of sequence identity, many researchers have guiding mutagenesis experiments, biochemical labeling,
been reluctant to use comparative models based on less annotation of point mutations, protein design, predicting
than 30% sequence identity in their research for fear of subcellular localization, in silico ligand docking, and
Figure 5. Examples of successful accuracy predictions where the sequence identity shared between the target and template was less than 12.5%, and yet
very accurate models were constructed. Relying upon the individual features alone, neither model would be assessed as being very accurate, yet the
weighed combination of these features using the model-specific assessment protocol leads to accurate assessments. In both images, the native structure
is colored red and the model is blue. (A) Sequence from murine neuroglobin (PDB code 1q1fA) modeled using 1it2A as a template. (B) Sequence of
4-hydroxybenzoyl CoA thioesterase (1q4tA) modeled using 1s5uA as a template.
prediction of protein complex structures. Using the displays linearity between the actual and predicted
predicted accuracy of the models to estimate the accuracy accuracy even for models that are not native-like (Fig.
of these structure-derived properties results in more 2C,D). In contrast, a refinement scheme relying upon a
precise and accurate estimates than relying upon score such as DOPE would have a more difficult time,
sequence identity (Fig. 4), as has historically been done. only being able to identify the unlikely event of sampling
There are, however, a few limitations to our model a very near-native solution (Fig. 2B). Furthermore, any
assessment protocol. First, the accuracy of the protocol refinement scheme built upon the model-specific scoring
could be increased if we were willing to sacrifice cover- protocol would have the benefit of giving the user an
age (i.e., not be able to predict the accuracy for all estimate of the actual accuracy of the model. Fourth, we
models). Given the current thresholds, we can predict the intend to develop a version of TSVMod using different
accuracy for 83% of the test set using tailored training feature types that will predict per-residue accuracy, in
sets populated by models of the same fold as the model the spirit of similar approaches such as ModFOLDclust
being assessed (Methods) (Fig. 6). If we increased the (McGuffin 2008), Prosa (Sippl 1993), FragQA (Gao et al.
minimum training set size threshold from five to 275 and 2007), and ProQres (Wallner and Elofsson 2006).
did not utilize the secondary structure filtering step, Finally, though we have principally described our
predictions could be made for only 20% of the test set, but protocol in the context of evaluating comparative models,
the RMSD and NO3.5Å correlation coefficients would we believe the construction of a tailored training set
increase from 0.84 and 0.86 to 0.90 and 0.92, respectively. by model size and secondary structure content will ulti-
A second limitation is that errors in the underlying scores mately be applicable to models generated by any method,
can affect the accuracy of the prediction. Improvements of including de novo predictions. Using this filtering step on
these scores, or the addition of other scores, will further the test set and using only alignment-independent scores
increase the accuracy of the method. (N-DOPE, Z-PAIR, Z-SURFACE, Z-COMBINED, and
There are many applications for our model assessment two PSIPRED/DSSP scores) as input features, we can
protocol. First, the protocol is being incorporated into currently predict the RMSD and NO3.5Å errors with
our comprehensive database of comparative models, correlation coefficients (r) of 0.80 and 0.81, respectively,
MODBASE, to increase the confidence that end-users to the actual errors. These numbers are lower than those
have in using such models for their experiments (Fig. of the ‘‘standard’’ TSVMod because: (1) The lack of a
4B,D). Second, the predicted NO3.5Å value will be used clearly defined template precludes the use of the more
as a filter to ensure that only models assessed to have the accurate ‘‘left’’ branch of tailored training set construc-
correct fold are deposited in MODBASE (Fig. 3). Third, tion (Fig. 6), resulting in a suboptimal tailored training
we suggest that the model-specific protocol may also be a set; and (2) several informative alignment-based features
good scoring function for the refinement of comparative (GA341, percentage of gapped residues in the alignment,
models (D. Eramian and A. Sali, in prep.) because it sequence identity) cannot be used. In addition, an ap-
plication of TSVMod to models calculated by pro-
grams other than MODELLER may also suffer from the
optimization of the TSVMod features specifically for
MODELLER models (e.g., DOPE can be fooled by
decoys that are packed extremely tightly, as one would
expect in energy-minimized models). In spite of these
three significant limitations, we decided to test how well
the reduced TSVMod performs relative to model quality
assessment (MQAP) programs tested at CASP. The
accuracy values (RMSD and native overlap) were calcu-
lated using MODELLER’s superpose command for
30,186 of the CASP7 models; the remaining 10,120
coordinate files could not be read by MODELLER or
the programs used to calculate features for TSVMod.
TSVMod had correlations with RMSD and NO3.5Å of
0.62 and 0.73, respectively, and the global Spearman rank
correlation coefficient between the actual and predicted
NO3.5Å values was 0.75. Though the Pearson correlation
coefficient values are below those for the ‘‘full’’ TSVMod
Figure 6. Flowchart depicting the steps to predict the RMSD and NO3.5Å on our benchmark of 0.84 and 0.86, respectively, the
of an input comparative model. Spearman rank correlation coefficient is comparable to
www.proteinscience.org 1889
Downloaded from www.proteinscience.org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Eramian et al.
the reported values for MQAPs on the CASP7 set The test set was constructed by taking the first model
(McGuffin 2007), despite the fact that TSVMod was not produced from each of the 580,317 unique target/template
alignments. The training database consisted of all 5,790,889
designed for assessing CASP models. Improvement of
models. All models in the test set are also in the training
TSVMod’s performance at assessing such models would database; this redundancy is accounted for during testing so the
be expected if additional scores were included as features, accuracy of the method is not overestimated. The model files,
and if the TSVMod training database was populated with alignments, and the accompanying TSVMod predictions and
models produced by the method being assessed. individual feature scores are all available for download by
anonymous ftp at http://salilab.org/decoys/.
In summary, we have developed a model-specific
scoring protocol to predict the absolute accuracy of
comparative models by optimizing the contributions of Model accuracy measures
up to nine features via SVM regression. This approach Three geometric accuracy measures were used: the Ca RMSD
has been shown to be able to predict the RMSD with a value between the model and the native structure after super-
correlation to the actual RMSD of 0.84; predict the position, the fraction of Ca atoms within 3.5 Å of their correct
NO3.5Å with a correlation to the actual NO3.5Å of positions in the native structure (the native overlap at 3.5 Å or
NO3.5Å), and the MaxSub score (Siew et al. 2000). The RMSD
0.86; differentiate between correct and incorrect models and NO3.5Å accuracy for each model were calculated by
better than existing methods; identify models with accu- MODELLER’s superpose command. As NO3.5Å is calculated
rate structure-derived properties better than relying upon by dividing the number of Ca atoms within 3.5 Å from their
sequence identity; and outperform the ProQ assessment correct positions by the length of the sequence, one must choose
score in predicting MaxSub of a model, even though our an appropriate denominator. We chose to use the number of
residues actually modeled, not the length of the input sequence,
approach was not developed to predict MaxSub. making our NO3.5Å measure a local accuracy measure. MaxSub
was obtained from the Fischer laboratory and run with default
parameters (Siew et al. 2000), with no correction made for the
Methods length of the input target sequence.
The mean score of a random protein conformation is estimated to 0.2, and values attempted for NO3.5Å ranged from 0.01to 0.1.
by a weighted sum of protein composition over the 20 standard The final values selected for the epsilon width of tube for
amino acid residue types, where each weight corresponds to the regression for the RMSD and NO3.5Å predictions were 0.1 and
expected change in the score by inserting a specific type of 0.05, respectively. All other SVMlight parameters were kept at
amino acid residue. The weights are estimated from a separate their default values.
training set of 1,686,320 models generated by MODPIPE.
Two PSIPRED (Jones 1999) and DSSP (Kabsch and Sander
1983) agreement scores were also calculated: the percentage of Fold assessment
amino acid residues that had different Q3 states for both the
model and the target sequence (PSIPREDPRCT), and a weighted The ability of individual scores to differentiate between correct
score that takes into account the PSIPRED prediction confi- and incorrect folds was assessed using receiver operating
dence (PSIPREDWEIGHT). These scores were implemented as characteristic (ROC) plots (Albeck and Borgesen 1990; Metz
described elsewhere (Eramian et al. 2006). et al. 1998), calculated with MODPIPE’s ROC module. This
module plots the true positive rate of classification against the
false positive rate on the x-axis. A model was defined as having
Flowchart for predicting RMSD and NO3.5A˚ the correct fold if its NO3.5Å value exceeded a threshold; the
two thresholds used were 0.30 and 0.50. If the model is correct,
A flowchart outlining the steps taken to create a tailored training the prediction is a true positive (TP) if it is classified as correct,
set and make SVM predictions is presented in Figure 6. Once a and a true negative (FN ) if it is classified as incorrect. If instead
model is built, ;20 sec of CPU are needed to calculate the nine the model is incorrect, the prediction is a true negative (TN ) if
individual assessment criteria, followed by an additional 10 sec the model is classified as incorrect, and a false positive (FP) if it
for the filtering and SVM stages; additional time is required if is classified as correct. The true positive and false positive
PSIPRED predictions have not been precalculated. rates displayed on the ROC plot are calculated by tp ¼ TP/P
The first step is to determine whether or not the aligned target and fp ¼ FP/P, where TP is the count of true positives, P is
and template sequences share more than 85% sequence identity the sum of true positives and false negatives, FP is the count of
(Fig. 6). If so, the RMSD and native overlap are predicted to be false positives, and N is the sum of true negatives and false
0.5 Å and 1.0, respectively, and no further steps are taken positives.
because nearly all comparative models built on templates
sharing such high sequence identity are native-like; only 0.9%
of the test models surpass this threshold. The second step is to Comparison to other MQAP programs
store the PDB identification code as well as starting and ending
residue indices of the template used to produce each model. To compare the model-specific approach to another approach for
Next, in the filtering step, the 5,790,889 model training assessing absolute accuracy, the stand-alone version of ProQ
database is first scanned to find all examples where the same v1.2 (Wallner and Elofsson 2003) was run for all models of the
region of the template was used either as a template or was itself test set. ProQ is a neural network that predicts the LGScore and
the target sequence, modeled using a different template. A MaxSub of an input model, using a general, rather than a model-
region is considered equivalent if the starting and ending points specific, training set. ProQ was run both with (ProQ-SS) and
are each within 10 residues of the modeled region and its overall without (ProQ) PsiPred v2.5 secondary structure predictions
length is within 10% of the length of the model. If an entry in (Jones 1999). We also ran the residue-based score ProQres v1.0
the training database used a chain from the same PDB file as the (Wallner and Elofsson 2006) for all models of the test set.
query target sequence, the entry was omitted to ensure that ProQres predicts the accuracy for each residue of an input
the tailored training set does not result in overestimating the model. To obtain a single score for an input model, the ProQres
accuracy of the method. Next, potential matches are filtered by scores for each residue were summed and divided by the number
the statistical potential scores and are included in the tailored of residues in the model.
training set only if the Z-PAIR, Z-SURFACE, and Z-COM- ModFOLD (McGuffin 2007) is a neural network that com-
BINED scores are each within 2 units of the score for the input bines data from ModSSEA (Pettitt et al. 2005), MODCHECK
model, and the N-DOPE score is within 0.5 units of that of the (McGuffin and Jones 2003), and ProQ to predict the accuracy of
model. an input model. ModFOLD was trained using TM-scores (Zhang
If the tailored training set contained fewer than five examples, a and Skolnick 2004) and is available as a web server. The
separate filtering procedure is employed to populate the tailored ModFOLD server (McGuffin 2008) allows a user to upload
training set; this occurred for 17% (99,947) of the test set. The .tar.gz files of up to 1000 models; the user must also upload the
secondary structure content of the input model is calculated using sequence of the model(s) being assessed. Because of this manual
MODELLER’s model.write_data command. The training database task and the high computational demands our 580,317 model set
was then scanned to find all entries whose size is within 10% of would place on the ModFOLD hardware, we instead tested the
the length of the model, and the helical and strand content are each ModFOLD server (v1.1) by 36,453 randomly selected models
within x 6 10%, where x are the values for the model. Entries for 225 sequences from our test.
from the same PDB file as the query target sequence are again
omitted. Potential matches are then filtered by the statistical
potential scores as described. Acknowledgments
Finally, two SVMs are trained to predict the RMSD and
NO3.5Å of the model. The SVMlight software package was used We acknowledge funds from Sandler Family Supporting Foun-
in regression mode, with a linear kernel, for all SVM training dation, U.S. National Institutes of Health (Grants R01-
(Joachims 1999). The nine training features used are the nine GM54762, R01-GM083960, U54-RR022220, U54-GM074945,
aforementioned assessment scores. Tested values of the epsilon P01-GM71790, U54-GM074929), U.S. National Science Foun-
width of tube for regression training for RMSD varied from 0.01 dation (Grant IIS-0705196), as well as Hewlett-Packard, Sun
www.proteinscience.org 1891
Downloaded from www.proteinscience.org on December 23, 2008 - Published by Cold Spring Harbor Laboratory Press
Eramian et al.
Microsystems, IBM, NetApp Inc., and Intel Corporation for Ginalski, K., Grishin, N.V., Godzik, A., and Rychlewski, L. 2005. Practical
hardware gifts. lessons from protein structure prediction. Nucleic Acids Res. 33: 1874–
1891.
Hubbard, S.J., Campbell, S.F., and Thornton, J.M. 1991. Molecular recognition.
Conformational analysis of limited proteolytic sites and serine proteinase
References protein inhibitors. J. Mol. Biol. 220: 507–530.
Jaroszewski, L., Li, W., and Godzik, A. 2002. In search for more accurate
Albeck, M.J. and Borgesen, S.E. 1990. ROC-curve analysis. A statistical alignments in the twilight zone. Protein Sci. 11: 1702–1713.
method for the evaluation of diagnostic tests. Ugeskr. Laeger 152: 1650– Joachims, T. 1999. Making large-scale SVM learning practical. In Advances in
1653. kernel methods: Support vector learning (eds. B. Schölkopf et al.). MIT
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Press, Cambridge, MA.
Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of Jones, D.T. 1999. Protein secondary structure prediction based on position-
protein database search programs. Nucleic Acids Res. 25: 3389–3402. specific scoring matrices. J. Mol. Biol. 292: 195–202.
Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C., and Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary struc-
Murzin, A.G. 2004. SCOP database in 2004: Refinements integrate ture: Pattern recognition of hydrogen-bonded and geometrical features.
structure and sequence family data. Nucleic Acids Res. 32: D226–D229. Biopolymers 22: 2577–2637.
Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Lazaridis, T. and Karplus, M. 1999. Discrimination of the native from
Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. 2004. UniProt: The misfolded protein models with an energy function including implicit
Universal Protein knowledgebase. Nucleic Acids Res. 32: D115–D119. solvation. J. Mol. Biol. 288: 477–487.
Baker, D. and Sali, A. 2001. Protein structure prediction and structural Lazaridis, T. and Karplus, M. 2000. Effective energy functions for protein
genomics. Science 294: 93–96. structure prediction. Curr. Opin. Struct. Biol. 10: 139–145.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Lee, B. and Richards, F.M. 1971. The interpretation of protein structures:
Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Estimation of static accessibility. J. Mol. Biol. 55: 379–400.
Acids Res. 28: 235–242. Lu, M., Dousis, A.D., and Ma, J. 2008. OPUS-PSP: An orientation-dependent
Bjelic, S. and Aqvist, J. 2004. Computational prediction of structure, substrate statistical all-atom potential derived from side-chain packing. J. Mol. Biol.
binding mode, mechanism, and rate for a malaria protease with a novel type 376: 288–301.
of active site. Biochemistry 43: 14521–14528. Madhusudhan, M.S., Marti-Renom, M.A., Sanchez, R., and Sali, A. 2006.
Bradley, P., Misura, K.M., and Baker, D. 2005. Toward high-resolution de novo Variable gap penalty for protein sequence-structure alignment. Protein Eng.
structure prediction for small proteins. Science 309: 1868–1871. Des. Sel. 19: 129–133.
Caffrey, C.R., Placha, L., Barinka, C., Hradilek, M., Dostal, J., Sajid, M., Marko, A.C., Stafford, K., and Wymore, T. 2007. Stochastic pairwise align-
McKerrow, J.H., Majer, P., Konvalinka, J., and Vondrasek, J. 2005. ments and scoring methods for comparative protein structure modeling.
Homology modeling and SAR analysis of Schistosoma japonicum cathepsin J. Chem. Inf. Model 47: 1263–1270.
D (SjCD) with statin inhibitors identify a unique active site steric barrier with Marti-Renom, M.A., Stuart, A.C., Fiser, A., Sanchez, R., Melo, F., and Sali, A.
potential for the design of specific inhibitors. Biol. Chem. 386: 339–349. 2000. Comparative protein structure modeling of genes and genomes.
Chakravarty, S. and Sanchez, R. 2004. Systematic analysis of added-value in Annu. Rev. Biophys. Biomol. Struct. 29: 291–325.
simple comparative models of protein structure. Structure 12: 1461–1470. Marti-Renom, M.A., Madhusudhan, M.S., and Sali, A. 2004. Alignment of
Chmiel, A.A., Radlinska, M., Pawlak, S.D., Krowarsch, D., Bujnicki, J.M., and protein sequences by their profiles. Protein Sci. 13: 1071–1087.
Skowronek, K.J. 2005. A theoretical model of restriction endonuclease McGuffin, L.J. 2007. Benchmarking consensus model quality assessment for
NlaIV in complex with DNA, predicted by fold recognition and validated protein fold recognition. BMC Bioinformatics 8: 345.
by site-directed mutagenesis and circular dichroism spectroscopy. Protein McGuffin, L.J. 2008. The ModFOLD server for the quality assessment of
Eng. Des. Sel. 18: 181–189. protein structural models. Bioinformatics 24: 586–587.
Chothia, C. and Lesk, A.M. 1986. The relation between the divergence of McGuffin, L.J. and Jones, D.T. 2003. Improvement of the GenTHREADER
sequence and structure in proteins. EMBO J. 5: 823–826. method for genomic fold recognition. Bioinformatics 19: 874–881.
Colubri, A., Jha, A.K., Shen, M.Y., Sali, A., Berry, R.S., Sosnick, T.R., and Melo, F. and Feytmans, E. 1997. Novel knowledge-based mean force potential
Freed, K.F. 2006. Minimalist representations and the importance of nearest at atomic level. J. Mol. Biol. 267: 207–222.
neighbor effects in protein folding simulations. J. Mol. Biol. 363: 835– Melo, F. and Feytmans, E. 1998. Assessing protein structures with a non-local
857. atomic interaction energy. J. Mol. Biol. 277: 1141–1152.
Costache, A.D., Pullela, P.K., Kasha, P., Tomasiewicz, H., and Sem, D.S. 2005. Melo, F. and Sali, A. 2007. Fold assessment for comparative protein structure
Homology-modeled ligand-binding domains of zebra fish estrogen recep- modeling. Protein Sci. 16: 2412–2426.
tors a, b1, and b2: From in silico to in vivo studies of estrogen interactions Melo, F., Sanchez, R., and Sali, A. 2002. Statistical potentials for fold
in Danio rerio as a model system. Mol. Endocrinol 19: 2979–2990. assessment. Protein Sci. 11: 430–448.
Domingues, F.S., Koppensteiner, W.A., Jaritz, M., Prlic, A., Weichenberger, C., Metz, C.E., Herman, B.A., and Roe, C.A. 1998. Statistical comparison of two
Wiederstein, M., Floeckner, H., Lackner, P., and Sippl, M.J. 1999. ROC-curve estimates obtained from partially paired datasets. Med. Decis.
Sustained performance of knowledge-based potentials in fold recognition. Making 18: 110–121.
Proteins Suppl 3: 112–120. Miyazawa, S. and Jernigan, R.L. 1996. Residue-residue potentials with a
Eramian, D., Shen, M.Y., Devos, D., Melo, F., Sali, A., and Marti-Renom, M.A. favorable contact pair term and an unfavorable high packing density term,
2006. A composite score for predicting errors in protein structure models. for simulation and threading. J. Mol. Biol. 256: 623–644.
Protein Sci. 15: 1653–1666. Park, B. and Levitt, M. 1996. Energy functions that discriminate X-ray and
Eswar, N., John, B., Mirkovic, N., Fiser, A., Ilyin, V.A., Pieper, U., Stuart, A.C., near-native folds from well-constructed decoys. J. Mol. Biol. 258: 367–392.
Marti-Renom, M.A., Madhusudhan, M.S., Yerkovich, B., et al. 2003. Tools Park, B.H., Huang, E.S., and Levitt, M. 1997. Factors affecting the ability of
for comparative protein structure modeling and analysis. Nucleic Acids Res. energy functions to discriminate correct from incorrect folds. J. Mol. Biol.
31: 3375–3380. 266: 831–846.
Eswar, N., Webb, B.M., Marti-Renom, M., Madhusudhan, M.S., Eramian, D., Pettitt, C.S., McGuffin, L.J., and Jones, D.T. 2005. Improving sequence-based
Shen, M.Y., Pieper, U., and Sali, A. 2007. Comparative protein structure fold recognition by using 3D model quality assessment. Bioinformatics
modeling using MODELLER. Curr. Protoc. Protein Sci. Chapter 2: Unit 2.9. 21: 3509–3515.
Fiser, A., Do, R.K., and Sali, A. 2000. Modeling of loops in protein structures. Pieper, U., Eswar, N., Davis, F.P., Braberg, H., Madhusudhan, M.S., Rossi, A.,
Protein Sci. 9: 1753–1773. Marti-Renom, M., Karchin, R., Webb, B.M., Eramian, D., et al. 2006.
Fitzgerald, J.E., Jha, A.K., Colubri, A., Sosnick, T.R., and Freed, K.F. 2007. MODBASE: A database of annotated comparative protein structure models
Reduced Cb statistical potentials can outperform all-atom potentials in and associated resources. Nucleic Acids Res. 34: D291–D295.
decoy identification. Protein Sci. 16: 2123–2139. Qiu, J., Sheffler, W., Baker, D., and Noble, W.S. 2007. Ranking predicted protein
Gao, X., Bu, D., Li, S.C., Xu, J., and Li, M. 2007. FragQA: Predicting local structures with support vector regression. Proteins 71: 1175–1182.
fragment quality of a sequence-structure alignment. Genome Inform. 19: Rai, B.K. and Fiser, A. 2006. Multiple mapping method: A novel approach to
27–39. the sequence-to-structure alignment problem in comparative protein struc-
Gatchell, D.W., Dennis, S., and Vajda, S. 2000. Discrimination of near-native ture modeling. Proteins 63: 644–661.
protein structures from misfolded models by empirical free energy Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12:
functions. Proteins 41: 518–534. 85–94.
Sali, A. and Blundell, T.L. 1993. Comparative protein modelling by satisfaction Sippl, M.J. 1993. Recognition of errors in three-dimensional structures of
of spatial restraints. J. Mol. Biol. 234: 779–815. proteins. Proteins 17: 355–362.
Samudrala, R. and Levitt, M. 2000. Decoys ‘R’ Us: A database of incorrect Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular
conformations to improve protein structure prediction. Protein Sci. 9: subsequences. J. Mol. Biol. 147: 195–197.
1399–1401. Tondel, K. 2004. Prediction of homology model quality with multivariate
Sanchez, R., Pieper, U., Melo, F., Eswar, N., Marti-Renom, M.A., regression. J. Chem. Inf. Comput. Sci. 44: 1540–1551.
Madhusudhan, M.S., Mirkovic, N., and Sali, A. 2000. Protein structure Tramontano, A., Leplae, R., and Morea, V. 2001. Analysis and assessment
modeling for structural genomics. Nat. Struct. Biol. 7: 986–990. of comparative modeling predictions in CASP4. Proteins Suppl 5: 22–38.
Sauder, J.M., Arthur, J.W., and Dunbrack Jr., R.L. 2000. Large-scale compar- Tsai, J., Bonneau, R., Morozov, A.V., Kuhlman, B., Rohl, C.A., and Baker, D.
ison of protein sequence alignment algorithms with structure alignments. 2003. An improved protein decoy set for testing energy functions for
Proteins 40: 6–22. protein structure prediction. Proteins 53: 76–87.
Seok, C., Rosen, J.B., Chodera, J.D., and Dill, K.A. 2003. MOPED: Method Wallner, B. and Elofsson, A. 2003. Can correct protein models be identified?
for optimizing physical energy parameters using decoys. J. Comput. Chem. Protein Sci. 12: 1073–1086.
24: 89–97. Wallner, B. and Elofsson, A. 2006. Identification of correct regions in protein
Shen, M.Y. and Sali, A. 2006. Statistical potential for assessment and prediction models using structural, alignment, and consensus information. Protein Sci.
of protein structures. Protein Sci. 15: 2507–2524. 15: 900–913.
Shen, M.Y., Davis, F.P., and Sali, A. 2005. The optimal size of a globular protein Xu, W., Yuan, X., Xiang, Z., Mimnaugh, E., Marcu, M., and Neckers, L. 2005.
domain: A simple sphere-packing model. Chem. Phys. Lett. 405: 224–228. Surface charge and hydrophobicity determine ErbB2 binding to the Hsp90
Shortle, D., Simons, K.T., and Baker, D. 1998. Clustering of low-energy chaperone complex. Nat. Struct. Mol. Biol. 12: 120–126.
conformations near the native structures of small proteins. Proc. Natl. Acad. Zhang, Y. and Skolnick, J. 2004. Scoring function for automated assessment of
Sci. 95: 11158–11162. protein structure template quality. Proteins 57: 702–710.
Siew, N., Elofsson, A., Rychlewski, L., and Fischer, D. 2000. MaxSub: An Zhou, H. and Zhou, Y. 2002. Distance-scaled, finite ideal-gas reference state
automated measure for the assessment of protein structure prediction improves structure-derived potentials of mean force for structure selection
quality. Bioinformatics 16: 776–785. and stability prediction. Protein Sci. 11: 2714–2726.
www.proteinscience.org 1893