CompSim Project 2
CompSim Project 2
Systems
Project 2
The objective of the paper is to improve the accuracy of protein structure predictions,
particularly for proteins whose structures are distant from those with known
experimental data. While in silico methods like template-based modeling and
machine learning are commonly used for structure prediction, their accuracy
tends to decrease for such distant proteins. The paper seeks to address this challenge
by developing a new refinement protocol that builds on existing molecular
dynamics (MD) simulation-based methods, which have been successful in improving
predicted protein model quality.
To achieve this, the authors focus on enhancing the refinement process by employing
optimized biasing functions, applying increased simulation temperatures,
and exploring the conformational space more broadly using alternative initial
models. These innovations aim to improve the sampling of conformations closer to
the native state, ultimately yielding higher-quality protein models. The paper
presents benchmark tests in which this new refinement protocol significantly
outperforms previous state-of-the-art MD-based methods, offering a more effective
approach for refining predicted protein structures.
However, even with these advancements, predicted structures often deviate from
experimental accuracy. Homology-based models tend to resemble their templates
closely but may not capture finer details such as side-chain packing and
structural variations caused by sequence differences. Similarly, machine learning-
based models, while effective in predicting inter-residue distances, tend to focus on
average structures of homologous sequences and may not accurately reflect the
unique features of a specific target sequence. Both approaches face challenges in
predicting regions with low sequence similarity or poor alignment, leading to
structural inaccuracies.
And the Protein Refinement model proposed by the authors aims to mitigate
these problems with the goal of bringing the models as close to their experimental
accuracies as possible.
Background of previous approaches and techniques.
The paper discusses the evolution of protein model refinement methods, starting
from earlier techniques that focused on improving local structures, like side chain
packing and hydrogen bond networks, to more advanced molecular dynamics
(MD) simulation-based methods. MD simulations enable extensive
conformational sampling, improving both local and global structure refinement
by guiding the model toward the native state with lower free energy. One of the most
successful strategies in MD-based refinement is ensemble averaging, which
reduces sensitivity to force field inaccuracies and helps models better resemble
experimental structures. Recent advances in machine-learning-based protein
structure predictions have provided better initial models for MD refinement,
allowing significant improvements in structure quality, such as packing of side
chains and loop structures.
The refinement protocol outlined in the study consists of three key stages: pre-
sampling, sampling, and post-sampling. Starting with an initial protein model,
stereochemical errors such as atomic clashes and improper bond states are corrected
using the locPREFMD method. The refined model is then solvated in a water box
and heated to a target simulation temperature. During the sampling stage, molecular
dynamics (MD) simulations are conducted with different strategies to generate a
range of structures. Two ensemble selection schemes are used in the post-
sampling stage to select structures: either the 25% lowest-scoring conformations
based on the RWplus scoring function or a combination of RWplus scores and
deviations from the initial structure. These selected structures are averaged, and any
stereochemical issues from averaging are resolved through short MD
simulations, side chain rebuilding, and further polishing using locPREFMD.
The simulations are performed using the CHARMM 36m force field (or a modified
version) with explicit water molecules. The modification in the force field includes
lowering the energy barriers for backbone dihedral angles to speed up
conformational transitions while maintaining protein stability. The initial
equilibration involves aligning the protein model, solvating the system, and
neutralizing the charge with sodium or chloride ions. The system is minimized, then
equilibrated with Langevin dynamics to gradually heat it to the target
temperature, followed by adjusting the simulation box in the NpT ensemble.
Harmonic restraints are applied to the Cα atoms during equilibration to stabilize the
system. These steps ensure efficient sampling and refinement while minimizing
computational costs.
The study tested different types of restraints to bias sampling during protein model
refinement. Restraints are commonly applied to keep structures close to their initial
models, which helps the sampling process focus on finding the correct structure in
the vicinity of the starting conformation. However, restraints can also hinder
conformational transitions by raising the energy barriers, which can make
refinement more difficult, particularly when the initial model significantly deviates
from the native state. One of the methods previously used involves flat-bottom
harmonic restraints on the Cartesian coordinates of Cα atoms, expressed
mathematically as:
where and are the Cartesian coordinates of the Cα atoms in the current
conformation and the reference model, respectively. The force constant was set to
0.025 kcal/mol/Ų, and the flat-bottom width was 4 Å. This method allowed
transitions within "restraint-free" regions, which in some cases led to significant
structural improvements.
Additionally, the study tested a variant using flat-bottom harmonic restraints applied
to the distances between Cα atoms, expressed as:
where and are distances between Cα atoms of the i-th and j-th residues.
This restraint was applied to residue pairs with distances below 10 Å in the initial
model and separated by more than three residues. The force constant for this
restraint was set to 0.05 kcal/mol/Ų, and the flat-bottom width was 2 Å.
The study also introduced a combined restraint, which switched between
Cartesian and distance-based restraints, with the following equation.
The application of restraints with respect to the initial model could mitigate these
risks by preventing excessive unfolding, making it possible to improve sampling
without compromising stability.
To explore the optimal temperature for refinement simulations with restraints, the
study conducted refinement runs at various temperatures: 298.15, 320, 340, 360,
and 380 K. The goal was to identify the temperature at which enhanced
sampling could be achieved while maintaining protein stability, with the
combined restraint being used in these simulations.
The study investigated whether refinement is more effective when using alternative
models as initial inputs, rather than the model closest to the native structure. The
hypothesis was that refinement success depends more on overcoming kinetic barriers
than on the proximity of the initial model to the native state, measured by metrics
like GDT-HA or RMSD. Alternative models, even if initially further from the native
structure, could potentially lead to better refinement if they are less trapped
kinetically.
To further explore the use of multiple templates, selected models and the initial
model were hybridized using Rosetta’s "iterative hybridize" protocol. This was
done with the “simple” option to prevent mutations, and various restraints were
applied. Flat-bottom harmonic restraints were used for structurally conserved
regions where the Cα−Cα RMSD between the models and the initial model was less
than 2 Å. These restraints were applied with k₀ = 0.25 kcal/mol/Ų and b_flat =
1 Å. Instead of the original 50 populations over 50 iterations, this study used 10
populations over 10 iterations for hybridization.
MD simulations were conducted for the original initial model and the alternative
models, with five 100 ns trajectories performed for each. Sampling was carried out
with restraints applied to each initial model. A refined model was generated after
post-sampling for each initial model, and an additional refined model was created by
aggregating sampled conformations from both the original and alternative models.
The study aimed to explore if using alternative models could enhance refinement
performance despite deviations from the native state.
Refinement Protocols.
The refinement performance with the new sampling protocol, which used both the
original initial model and additional alternative models, was benchmarked and
compared against previous refinement protocols. For comparison, two reference
protocols were employed. The first, CASP12-simple, mirrored the protocol used
during CASP12 or the conservative approach from CASP13, and involved MD
simulations with harmonic restraints on Cα coordinates at 298.15 K. The
second, CASP13-simple, simplified the iterative protocol from CASP13 by omitting
iterative sampling and using RWplus scoring for ensemble selection. CASP13-
simple also employed MD simulations with flat-bottom harmonic restraints on
Cα coordinates at 298.15 K.
Benchmark Sets:
Two benchmark sets were used for evaluating the performance. The first set included
28 CASP10 refinement targets, which were used to assess the effects of
simulation parameters such as temperature and the type of bias restraints. The
second set contained 103 CASP11−13 refinement targets, serving as the main
test set for comparing different refinement protocols. Five CASP12 targets were
excluded due to the absence of experimental structures.
This thorough benchmarking allowed for a detailed comparison of refinement
performance across different protocols, highlighting the effects of temperature,
restraints, and the inclusion of alternative models in the new CASP14 sampling
protocol.
Analysis of Refinement of Biologically Important Regions.
RESULTS
Examples of successful refinement include TR663 (PDB ID: 4EXR), where distance
restraints facilitated better collective movements of secondary structure elements,
enhancing both global and local qualities. However, TR699 (PDB ID: 4KT7), a dimer
structure, posed challenges with distance restraints, which resulted in a misplaced β-
turn, highlighting the limitations in handling oligomerization and protein−protein
interactions. In contrast, Cartesian restraints preserved the β-turn in this case.
The paper found refinement to be improved when using multiple initial models
regardless of the protocol for ensemble selection.
Concluding Remarks
This paper reinforced many of the concepts I learned in the course, particularly
regarding MD simulations and my use of Gromacs. Working with Gromacs
helped me understand crucial aspects like solvation models, force field
selection, and energy minimization protocols, which are vital for achieving
stable initial conformations before running simulations. I also gained hands-on
experience with temperature coupling algorithms and how they influence
thermodynamic properties in the simulated system. The ability to run
equilibration phases and analyze RMSD and radius of gyration outputs using
Gromacs was key to mastering protein structure refinement workflows.
As an aspiring dry lab researcher , this course was especially valuable because it
allowed me to apply computational modeling to real biological systems,
deepening my understanding of molecular interactions. I am grateful to Dr.
HamsaPriya for offering such a course and am always also thankful to the teaching
assistants for their help and guidance. Their support (often during unholy hours)
made complex topics more accessible, ensuring a smooth learning experience and I
cannot thank them enough for their patience and guidance.