0% found this document useful (0 votes)
21 views12 pages

Experiment-7(HOMOLOGY MODELING)

The document outlines a Bioinformatics Lab experiment focused on homology modeling for protein structure prediction. It details the theoretical background, types of modeling, and a six-step procedure for conducting the experiment using MODELLER software, including template selection, sequence alignment, and model evaluation. The document emphasizes the importance of template similarity and provides instructions for generating and validating protein models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views12 pages

Experiment-7(HOMOLOGY MODELING)

The document outlines a Bioinformatics Lab experiment focused on homology modeling for protein structure prediction. It details the theoretical background, types of modeling, and a six-step procedure for conducting the experiment using MODELLER software, including template selection, sequence alignment, and model evaluation. The document emphasizes the importance of template similarity and provides instructions for generating and validating protein models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

BIOINFORMATICS LAB BTech 6th Sem Dr. R. K.

Pradhan, OUTR Bhubaneswar

Bioinformatics Lab (2023-24)

Experiment - 7: Homology Modeling

Aim: To Perform Homology modeling for a given protein.

INSTRUCTIONS: In order to clearly understand each experiment and make best use of
content provided, we suggest you to proceed as per the following steps.

1. Initially start with the theory section, recall technical knowledge over each steps, go through
manual and define an overall design, workflow, to conduct a 2-DE experiment.
2. In protocol section learn the minute integrity required to perform the experiment by going
through the standardized protocol defined for each of the steps. ,
3. Practice; operate through each step of the experiment in simulator section to visualize the entire
process.

BASIC THEORY

Homology modeling is a computational approach for three-dimensional protein structure


modeling and prediction. Proteins whose structures are still uncharacterized can be modeled
using homology modeling. This method builds an atomic model based on experimentally
determined known structures that have sequence homology of more than 40% with the target
molecule. Modeling structures with less then 40% template similarity would result in less
reliable models and hence ignored. Homology modeling is also known as comparative
modeling.

The principle governing this approach is that if two proteins share a high sequence similarity,
they are more likely to have very similar three-dimensional structures. If one of the protein
sequences has a known structure, then this structure can be superimposed onto the unknown
protein with a high degree of confidence. Protein sequences are more conserved than DNA and
hence attribute to greater evolutionary significance.

While homology modeling predicts the positions of alpha carbons with moderate accuracy, it
is not quite reliable in predicting side chains and loops. The others approaches are threading
and ab- initio prediction.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

Types of modeling:
• Basic Modeling - Modeling using a template with very high similarity with the target sequence.

• Advanced Modeling - In this case, the target is modeled using more than one template such
that regions of the template proteins that share a high identity with portions of the target are
used individually to model these sections.

The overall homology modeling procedure consists of six steps-

Step I - Template Selection


Template selection involves searching the Protein Data Bank (PDB) for homologous proteins
with determined structures. The search can be performed using a heuristic pairwise alignment
search program like BLAST or FASTA. As a rule of thumb, a database protein should have at
least 40% sequence identity, high resolution and the most appropriate cofactors for it to be
considered as a template sequence. The protein sequence whose 3D structure is to be predicted
is called the "target sequence".

Step II – Sequence Alignment


Once the template is identified, the full-length sequences of the template and target proteins
need to be realigned using refined alignment algorithms to obtain optimal alignment. The
alignment gives specific alignment scores.

Step III - Backbone Model Building


Once optimal alignment is achieved the corresponding coordinate's residues from the template
proteins can be simply copied onto the target protein. If the two aligned residues are identical,
coordinates of the side chain atoms are copied along with the main chain atoms.
If multiple templates selected, then average coordinate values of the templates are used.

Step IV – Loop Modeling


After the sequence alignment, there are often regions created by insertions and deletions that
lead to gaps in alignment. These gaps are modeled by loop modeling, which is less accurate, a
major source of error. Currently, two main techniques are used to approach the problem:

• The database searching method - this involves finding loops from known protein structures and
superimposing them onto the two stem regions (main chains mostly) of the target protein. Some
specialized programs like FREAD and CODA can be used.

• The ab initio method - this generates many random loops and searches for one that has
reasonably low energy and φ and ψ angles in the allowable regions in the Ramachandran plot.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

Step V - Side Chain Refinement


After the main chain atoms are built, the positions of side chains must be determined. This is
important in evaluating protein–ligand interactions at active sites and protein–protein
interactions at the contact interface.
A side chain can be built by searching every possible conformation for every torsion angle of
the side chain to select the one that has the lowest interaction energy with neighboring atoms.
A rotamer library can also be used, which has all the favorable side chain torsion angles
extracted from known protein crystal structures can also be used for this purpose.

Step VI - Model Refinement and Model Evaluation


This step carries out the energy minimization procedure on the entire model, which adjusts the
relative position of the atoms so that the overall conformation of the molecule has the lowest
possible energy potential. The goal of energy minimization is to relieve steric collisions without
altering the overall structure. In these loop and side chain modeling steps, potential energy
calculations are applied to improve the model. Model refinement can also be done by Molecular
Dynamic simulation which moves the atoms toward a global minimum by applying various
stimulation conditions (heating, cooling, considering water molecules) thus having a better
chance at finding the true structure.

The final model has to be evaluated for checking the φ–ψ angles, chirality, bond lengths, close
contacts and also the stereo chemical properties. Various online protein validation software
packages are available such as Procheck, WHATIF, ANOLEA, Verify3D, PROSA.

• Various Comprehensive Modeling Programs are available like Modeller, SWISS MODEL,
Schrodinger, 3D- JIGSAW.

• A successful model depends on template selection, algorithm used and the validation of the
model.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

PROCEDURE
Materials Required:
System and Internet access
• Software: MODELLER9v8

Protein sequence whose 3D structure needs to be built is obtained from Uniprot


in .fasta format.
URL: http://www.uniprot.org/

Protein sequence retrieved from Uniprot in FASTA format

For this experiment, we have used MODELLER9v8 which is a Homology Modeling sofware.
All the input files for running the following software are available here. The Software accepts
the sequences with PIR format.
1) The above FASTA sequence is converted into PIR format to make it readable by the
software. This file must have the extension of .ali format.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

2) After the .ali file is created, “build_profile.py” is searched for in the available input files.
The “build_profile.py” file is obtained and the necessary changes are made.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

3) The script file is now run using the command interpreter as shown below:

On running the betalactoglobulin_build_profile.py python file, a .prf is obtained which is the


profile file of the query sequence and its homologs.

The most important columns in the profile.build() output are the II, XI and XII columns. The
II column reports the code of the PDB sequence that was compared with the target sequence.
The XI column reports the percentage sequence identities between betalactoglobulin and a
PDB sequence normalized by the lengths of the alignment. In general, a sequence identity value
above approximately 35% indicates a potential template. A better measure of the significance
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

of the alignment is given in the XII column by the e-value of the alignment. To select the most
appropriate template for our query sequence over the six similar structures, we will use the
alignment.compare_structures() command to assess the structural and sequence similarity
between the possible templates (file "betalactoglobulin_compare.py").

4). All the structural files for PDB database whose E value is 0.0 are obtained from the above
obtained .prf file and they are listed in the compare.py file for the identification of thye best
template.The Input file “alignment.compare() “ is selected and all the templates are added.

The selected PDB files are mentioned in the compare.py file as one of these could be the
probable template.
5). Now, the script file is run using the command interpreter as shown below.

6). After running the compare file, the following dendogram is displayed. The templates are
compared with each other. This comparison gives information about the crystallographic
resolution of the pdb structure. The structure with a better crystallographic R-factor and higher
overall sequence identity to the query sequence is selected to be aligned with the target
sequence. The comparison shown above is for both sequential and structural similarity. The
template with crystallographic R-factor should be in the range (1.5 to 2.5) and the sequence
similarity should be as high as possible.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

For example, the below shown situation 1bdma has an Rfactor @1.8 and sequence similarity
of >40%, therefore it can be considered a good template. When there are two templates with
optimal required parameters then both should be selected. The compare.log file is chosen to
build the structure of the unknown protein and prepare the align2d.py file.

7). The above selected template is the template according to which the model query will be
built. The align2d.py file is modified as shown below. Appending of files i.e. original .ali file
and the selected template will produce two alignment files .PIR and .PAP. After running the
aligned file, an alignment file is created which is used for modeling.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

8). Now the script file is run using the command interpreter as shown below:

The following file is the alignment file produced by the software. In this file, the query
sequence is globally aligned with the template and the '*' indicates the perfect match in the
sequence. If the alignment results are not satisfactory, we can go back to the template selection
step and replace the template with another protein with comparable similarity.

9). Once a target-template alignment is constructed, software calculates a 3D model of the


target completely automatically, using its automodel class. The following script will generate
five similar models of betalactoglobulin based on the selected template structure and also based
on the alignment in file (file "betalactoglobulin-model-single.py").
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

The number of modes to be generated here must be mentioned, if the multiple models are
desired at the end of the session.

The Script file is run using the command interpreter as shown below:

10). The output file has a statistical score calculated called the DOPE (Discrete Optimized
Protein Energy) score and GA341 score. After the models are created, each of their DOPE
Scores and GA341 Score are observed to that rank the models on the basis of their energy
levels. Models with the lowest DOPE assessment score, or with the highest GA341 assessment
score have the most stable minimized energy, and therefore are picked from the rest of the
models for further protein structure analysis. Range of the GA341 scores are 0.0 (worst) to 1.0
(native-like) but DOPE score values are more reliable in distinguishing good models from the
bad ones. In the example shown they have selected the B99990001.pdb and B99990002.pdb
as the optimal models with the most stable structure and the least energy.

Two structures were built whose snapshots are provided below.


BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

The models created contain the co-ordinates of the model in .pdb format. These models can be
viewed by any molecular visualization software able to read .pdb files such as Rasmol,
Chimera, Pymol, etc.

Evaluation/Validation of the model generated:

11).Once the models are generated, they must be evaluated for any missing residues, side chain
and also to check for steric slashes. The model evaluation can be carried out by the software
itself or even with web based tool like ProCheck, ProSA, Verify3D.
For this example we have validated the model using the MODELLER software. We need to
obtain the model "evaluate_model.py" and mention the name of the pdb file to be evaluated.
We have mentioned the file name "betalactoglobulin.B99990002.pdb"

12). Now, the script file is run using the command interpreter as shown below:
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar

13). This profile is written to a file "betalactoglobulin.profile", which can be used as input to
a graphing program. For example, the plot shown below, it is plotted with GNUPLOT.
Alternatively, the plot_profiles.py script can be used which is included in the tutorial zip file
to plot profiles with the Python matplotlib package.

The above graph displays the difference in the DOPE score per residue of the template with
the model generated. This profile is written to a file "betalactoglobulin.profile", which can be
used as input to a graphing program. For example, the plot shown below, it is plotted
with GNUPLOT. By analyzing the above graph, we can see that the template residues which
are in the range of 50 - 80 and positions 240 - 270 are most conserved with the new generated
model. Further, some literature studies on the proteins also revealed one of the regions to be
active site of the protein. The variable regions of the model can be evaluated by more
comprehensive programs like Advance Modeling, Schrodinger, Robetta etc.

Note: For more details refer MODELLER Manual.


----- ALL THE BEST----
*********

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy