0% found this document useful (0 votes)
18 views11 pages

CompSim Project 2

Uploaded by

be22b042
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

CompSim Project 2

Uploaded by

be22b042
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

BT5420 Computer Simulations of Biomolecular

Systems
Project 2

S V Venkata Sai Vishwesvar | BE22B042

Improved Sampling Strategies for Protein Model Refinement Based on


Molecular Dynamics Simulation

Lim Heo, Collin F. Arbour, Giacomo Janson, and Michael Feig


The objective of the paper.

The objective of the paper is to improve the accuracy of protein structure predictions,
particularly for proteins whose structures are distant from those with known
experimental data. While in silico methods like template-based modeling and
machine learning are commonly used for structure prediction, their accuracy
tends to decrease for such distant proteins. The paper seeks to address this challenge
by developing a new refinement protocol that builds on existing molecular
dynamics (MD) simulation-based methods, which have been successful in improving
predicted protein model quality.

To achieve this, the authors focus on enhancing the refinement process by employing
optimized biasing functions, applying increased simulation temperatures,
and exploring the conformational space more broadly using alternative initial
models. These innovations aim to improve the sampling of conformations closer to
the native state, ultimately yielding higher-quality protein models. The paper
presents benchmark tests in which this new refinement protocol significantly
outperforms previous state-of-the-art MD-based methods, offering a more effective
approach for refining predicted protein structures.

Current Computational Approaches for Protein Structure Prediction and


Their Limitations

The paper discusses the progress and challenges in computational modeling of


protein structures. It highlights those computational approaches, particularly those
using homologous structures and machine learning, have become essential
alternatives to experimental methods for determining protein structures.
Homology-based methods are now more reliable due to the growing number of
experimentally determined structures and sequence databases, allowing for structure
prediction with reasonable accuracy. Machine learning models, especially deep
neural networks that leverage co-evolutionary couplings, have been particularly
groundbreaking for predicting structures where no close homologues exist.

However, even with these advancements, predicted structures often deviate from
experimental accuracy. Homology-based models tend to resemble their templates
closely but may not capture finer details such as side-chain packing and
structural variations caused by sequence differences. Similarly, machine learning-
based models, while effective in predicting inter-residue distances, tend to focus on
average structures of homologous sequences and may not accurately reflect the
unique features of a specific target sequence. Both approaches face challenges in
predicting regions with low sequence similarity or poor alignment, leading to
structural inaccuracies.

And the Protein Refinement model proposed by the authors aims to mitigate
these problems with the goal of bringing the models as close to their experimental
accuracies as possible.
Background of previous approaches and techniques.

The paper discusses the evolution of protein model refinement methods, starting
from earlier techniques that focused on improving local structures, like side chain
packing and hydrogen bond networks, to more advanced molecular dynamics
(MD) simulation-based methods. MD simulations enable extensive
conformational sampling, improving both local and global structure refinement
by guiding the model toward the native state with lower free energy. One of the most
successful strategies in MD-based refinement is ensemble averaging, which
reduces sensitivity to force field inaccuracies and helps models better resemble
experimental structures. Recent advances in machine-learning-based protein
structure predictions have provided better initial models for MD refinement,
allowing significant improvements in structure quality, such as packing of side
chains and loop structures.

MD simulations are now widely used for conformational sampling during


refinement, providing an effective way to explore diverse conformational states while
preventing large-scale unfolding, which could degrade model quality. Challenges
arise due to kinetic barriers and the need for partial unfolding and refolding,
especially in larger proteins. Extensive MD simulations or enhanced sampling
techniques like replica exchange MD have been proposed to overcome these
barriers, though they are computationally expensive. To address this, restraints are
often applied to the initial models during MD simulations to prevent unfolding and
enable a quicker path to the native state, though some unfolding is crucial to refine
structural defects.

METHODS AND TECHNIQUES:

Overview of the Refinement protocol.

The refinement protocol outlined in the study consists of three key stages: pre-
sampling, sampling, and post-sampling. Starting with an initial protein model,
stereochemical errors such as atomic clashes and improper bond states are corrected
using the locPREFMD method. The refined model is then solvated in a water box
and heated to a target simulation temperature. During the sampling stage, molecular
dynamics (MD) simulations are conducted with different strategies to generate a
range of structures. Two ensemble selection schemes are used in the post-
sampling stage to select structures: either the 25% lowest-scoring conformations
based on the RWplus scoring function or a combination of RWplus scores and
deviations from the initial structure. These selected structures are averaged, and any
stereochemical issues from averaging are resolved through short MD
simulations, side chain rebuilding, and further polishing using locPREFMD.

The simulations are performed using the CHARMM 36m force field (or a modified
version) with explicit water molecules. The modification in the force field includes
lowering the energy barriers for backbone dihedral angles to speed up
conformational transitions while maintaining protein stability. The initial
equilibration involves aligning the protein model, solvating the system, and
neutralizing the charge with sodium or chloride ions. The system is minimized, then
equilibrated with Langevin dynamics to gradually heat it to the target
temperature, followed by adjusting the simulation box in the NpT ensemble.
Harmonic restraints are applied to the Cα atoms during equilibration to stabilize the
system. These steps ensure efficient sampling and refinement while minimizing
computational costs.

Simulation details as given in the paper.

Force Field CHARMM36m


Water Model TIP3P
Periodic rectangular simulation box with a marginal distance
PBC Box of 9 Å from the protein to the box edges.
Neutralization Sodium or chloride ions used to neutralize the system.
Minimization l-BFGS-b algorithm, up to 500 steps.
Temperature Gradually heated up to the target temperature.
Initial equilibration involved NVT ensemble, followed by NpT
Equilibration(NVT and NPT) ensemble at 1 bar
Barostat Monte Carlo barostat.
Thermostat Langevin dynamics with a 0.01/ps friction coefficient.
Electrostatic Interactions PME
Bond Constraints SHAKE algorithm applied to bonds involving hydrogens.
Nonbonded interactions used a switching function between 8
Cutoff and 10 Å

Refinement with various restraint biases.

The study tested different types of restraints to bias sampling during protein model
refinement. Restraints are commonly applied to keep structures close to their initial
models, which helps the sampling process focus on finding the correct structure in
the vicinity of the starting conformation. However, restraints can also hinder
conformational transitions by raising the energy barriers, which can make
refinement more difficult, particularly when the initial model significantly deviates
from the native state. One of the methods previously used involves flat-bottom
harmonic restraints on the Cartesian coordinates of Cα atoms, expressed
mathematically as:

where and are the Cartesian coordinates of the Cα atoms in the current
conformation and the reference model, respectively. The force constant was set to
0.025 kcal/mol/Ų, and the flat-bottom width was 4 Å. This method allowed
transitions within "restraint-free" regions, which in some cases led to significant
structural improvements.
Additionally, the study tested a variant using flat-bottom harmonic restraints applied
to the distances between Cα atoms, expressed as:

where and are distances between Cα atoms of the i-th and j-th residues.

This restraint was applied to residue pairs with distances below 10 Å in the initial
model and separated by more than three residues. The force constant for this
restraint was set to 0.05 kcal/mol/Ų, and the flat-bottom width was 2 Å.
The study also introduced a combined restraint, which switched between
Cartesian and distance-based restraints, with the following equation.

where λ is a switching parameter that transitions between the two types of


restraints.
To compare the performance of these restraint strategies, refinement runs were
carried out with each type of restraint, as well as without restraints. In each run, five
trajectories of molecular dynamics (MD) simulations were performed, each for 100
ns, generating 10,000 protein conformations by recording the structures every 50 ps.
This allowed the authors to reassess the impact of restraints on refinement quality.

Refinement at Higher Temperatures:

The study tested elevated temperatures during molecular dynamics (MD)


simulations, moving beyond the previously assumed room temperature of 298
K. Temperature is a crucial factor in conformational sampling, as higher
temperatures can accelerate transitions between different conformational states.
Since protein models often require partial unfolding and refolding during
refinement, increased temperatures may enhance this process. However, higher
temperatures also risk protein instability and thermal denaturation.

The application of restraints with respect to the initial model could mitigate these
risks by preventing excessive unfolding, making it possible to improve sampling
without compromising stability.
To explore the optimal temperature for refinement simulations with restraints, the
study conducted refinement runs at various temperatures: 298.15, 320, 340, 360,
and 380 K. The goal was to identify the temperature at which enhanced
sampling could be achieved while maintaining protein stability, with the
combined restraint being used in these simulations.

Refinement with Multiple Alternative Initial Models.

The study investigated whether refinement is more effective when using alternative
models as initial inputs, rather than the model closest to the native structure. The
hypothesis was that refinement success depends more on overcoming kinetic barriers
than on the proximity of the initial model to the native state, measured by metrics
like GDT-HA or RMSD. Alternative models, even if initially further from the native
structure, could potentially lead to better refinement if they are less trapped
kinetically.

Alternative initial models were generated using template-based modeling. First, a


sequence profile was created by searching against the UniClust30 database using
HHblits. Next, structure homologues were identified using HHsearch and the
Viterbi algorithm in local alignment mode. The top 100 homologous structures
were compared with the initial model using TM-align, and structures with a TM-
score > 0.6 were selected for further modeling. Sequence alignments were
generated using HHalign with the MAC algorithm in global alignment mode,
allowing for up to three alternative sequence alignments. For each alignment, 12
models were generated using MODELLER, and the best-scoring model was
selected. Up to 10 models were selected based on a TM-score cutoff, and if fewer than
two models met the criteria, alternative models were not pursued.

To further explore the use of multiple templates, selected models and the initial
model were hybridized using Rosetta’s "iterative hybridize" protocol. This was
done with the “simple” option to prevent mutations, and various restraints were
applied. Flat-bottom harmonic restraints were used for structurally conserved
regions where the Cα−Cα RMSD between the models and the initial model was less
than 2 Å. These restraints were applied with k₀ = 0.25 kcal/mol/Ų and b_flat =
1 Å. Instead of the original 50 populations over 50 iterations, this study used 10
populations over 10 iterations for hybridization.
MD simulations were conducted for the original initial model and the alternative
models, with five 100 ns trajectories performed for each. Sampling was carried out
with restraints applied to each initial model. A refined model was generated after
post-sampling for each initial model, and an additional refined model was created by
aggregating sampled conformations from both the original and alternative models.
The study aimed to explore if using alternative models could enhance refinement
performance despite deviations from the native state.

Refinement Protocols.

The refinement performance with the new sampling protocol, which used both the
original initial model and additional alternative models, was benchmarked and
compared against previous refinement protocols. For comparison, two reference
protocols were employed. The first, CASP12-simple, mirrored the protocol used
during CASP12 or the conservative approach from CASP13, and involved MD
simulations with harmonic restraints on Cα coordinates at 298.15 K. The
second, CASP13-simple, simplified the iterative protocol from CASP13 by omitting
iterative sampling and using RWplus scoring for ensemble selection. CASP13-
simple also employed MD simulations with flat-bottom harmonic restraints on
Cα coordinates at 298.15 K.

The new sampling protocol, named CASP14, introduced combined restraints


and performed MD simulations at a higher temperature of 360 K. Two variants of
the CASP14 protocol were tested: one using a single initial model, and the other
using multiple initial models (the original initial model and additional alternative
models). Each simulation protocol was tested independently three times for each
target, and statistical analyses were based on the average values across these
independent runs.

Benchmark Sets:
Two benchmark sets were used for evaluating the performance. The first set included
28 CASP10 refinement targets, which were used to assess the effects of
simulation parameters such as temperature and the type of bias restraints. The
second set contained 103 CASP11−13 refinement targets, serving as the main
test set for comparing different refinement protocols. Five CASP12 targets were
excluded due to the absence of experimental structures.
This thorough benchmarking allowed for a detailed comparison of refinement
performance across different protocols, highlighting the effects of temperature,
restraints, and the inclusion of alternative models in the new CASP14 sampling
protocol.
Analysis of Refinement of Biologically Important Regions.

To assess the effectiveness of physics-based refinement for regions of proteins


that are biologically significant, our study concentrated on enhancing model quality
at ligand binding sites and structurally variable regions. These areas are
critical for protein function, making it advantageous to improve their model quality
through refinement techniques. We focused exclusively on biologically relevant
ligands, excluding substances used solely for crystallization, such as polyethylene
glycol. Ligand binding sites were identified as residues with any heavy atom within
5 Å of a binding ligand.

The evaluation of physics-based refinement's performance was based on the


backbone atom RMSD (Root Mean Square Deviation) of residues located in the
ligand binding site. For structurally variable regions, we defined variable local
regions (VLRs), which are similar to unreliable local regions (ULRs). A VLR
consists of more than three consecutive residues with an RMSF (Root Mean Square
Fluctuation) greater than 3.8 Å when comparing single template-based homology
models to the initial model. In contrast, ULRs are characterized by deviations from
the experimental structure. For our analysis, we included only homology
models with a TM-score (Template Modeling Score) greater than 0.75 relative to
the initial model, ensuring high-quality model comparisons.

RESULTS

Refinement with Different Forms of Restraints:

In protein structure refinement, combined restraints have proven to be the most


effective among various tested types, offering improvements in both global and
local structure features. Simulations were conducted at two temperatures, 298.15
K and 360 K, and across these temperatures, combined restraints consistently
outperformed others (Figure 2A). Specifically, Cartesian restraints showed superior
performance in enhancing global structure similarity based on metrics like RMSD
and GDT-HA, while distance restraints were better at refining local structural
features, measured by lDDT and SphereGrinder
Restraints are essential as they restrict sampling to the vicinity of the initial
model, preserving its correct structural features. Cartesian restraints focus on
maintaining the overall structure, thus minimizing deviation in global
similarities, while distance restraints focus on local interactions. Combined
restraints, however, achieve a balance, improving both global and local features more
effectively.

Examples of successful refinement include TR663 (PDB ID: 4EXR), where distance
restraints facilitated better collective movements of secondary structure elements,
enhancing both global and local qualities. However, TR699 (PDB ID: 4KT7), a dimer
structure, posed challenges with distance restraints, which resulted in a misplaced β-
turn, highlighting the limitations in handling oligomerization and protein−protein
interactions. In contrast, Cartesian restraints preserved the β-turn in this case.

A gradual switch from Cartesian to distance restraints during sampling allows


models to benefit from both techniques, covering more conformational space. This
method, paired with scoring functions like RWplus, can produce refinements
comparable or superior to the best individual restraint methods.
In the absence of restraints, model qualities generally deteriorated,
confirming the importance of restraint application in protein structure refinement.

Refinement at Higher Temperatures

Refinement of protein structures using Molecular Dynamics (MD) simulations


showed improved performance at higher temperatures, particularly at 360 K.
Refinement effectiveness increased as temperature rose, peaking at 360 K before
declining at higher temperatures, especially in terms of GDT-HA scores. Elevated
temperatures facilitate the overcoming of energy barriers, allowing for more
conformational transitions and broader sampling of native-like structures.

At moderately increased temperatures (360 K), MD simulations sampled a


range of native-like conformations, resulting in well-packed structures with
favorable RWplus scores, reflecting good packing and overall improved structures.
However, at higher temperatures (380 K), the sampled conformations shifted to
less well-packed structures with higher RWplus scores, despite some improvement in
secondary structure arrangements.
Consistency in refinement performance was also examined across different
temperatures. At higher temperatures, independent runs produced more consistent
results due to the ease of escaping from local minima, reducing variability in
sampled conformations. In contrast, lower temperatures resulted in kinetic
trapping in local minima, causing variability between independent runs.
Considering both refinement performance and consistency, MD simulations
at 360 K were determined to be the optimal choice for structure refinement when
combined with restraints, balancing effective sampling and computational
efficiency.

Refinement Using Additional Alternative Initial Models

The paper found refinement to be improved when using multiple initial models
regardless of the protocol for ensemble selection.

Concluding Remarks

This study systematically evaluated and optimized an existing MD-based protein


structure refinement protocol, resulting in significant improvements in both
accuracy and the range of applicability for refining protein structures. The new
protocol demonstrated enhanced performance, especially for larger targets and
models of poorer initial quality. The key components contributing to these
improvements are:
1. Combined restraints: Utilizing both Cartesian restraints and inter-residue
distances based on Cα atoms provided a better balance between preserving
global structures and refining local features.

2. Elevated temperature simulations: Running simulations at 360 K


facilitated more efficient conformational sampling without causing the
structures to unfold, thanks to the constraints applied during the simulations.

3. Alternative initial models: Generating hybrid models from the original


structure and additional template-based models expanded the conformational
space, allowing for better refinement by exploring more refinable regions in
the structure.
This optimized protocol significantly advanced protein structure refinement,
bringing models closer to experimental accuracy and broadening the practical use of
MD-based refinement methods.

Reflections on the Course Experience

This paper reinforced many of the concepts I learned in the course, particularly
regarding MD simulations and my use of Gromacs. Working with Gromacs
helped me understand crucial aspects like solvation models, force field
selection, and energy minimization protocols, which are vital for achieving
stable initial conformations before running simulations. I also gained hands-on
experience with temperature coupling algorithms and how they influence
thermodynamic properties in the simulated system. The ability to run
equilibration phases and analyze RMSD and radius of gyration outputs using
Gromacs was key to mastering protein structure refinement workflows.

As an aspiring dry lab researcher , this course was especially valuable because it
allowed me to apply computational modeling to real biological systems,
deepening my understanding of molecular interactions. I am grateful to Dr.
HamsaPriya for offering such a course and am always also thankful to the teaching
assistants for their help and guidance. Their support (often during unholy hours)
made complex topics more accessible, ensuring a smooth learning experience and I
cannot thank them enough for their patience and guidance.

Overall, I am very glad I took this course, as it significantly enhanced both my


technical skills and my appreciation for computational biology and this course
has made me want to pursue a minor in computational biology.

Thank you very very much!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy