Experiment-7(HOMOLOGY MODELING)
Experiment-7(HOMOLOGY MODELING)
INSTRUCTIONS: In order to clearly understand each experiment and make best use of
content provided, we suggest you to proceed as per the following steps.
1. Initially start with the theory section, recall technical knowledge over each steps, go through
manual and define an overall design, workflow, to conduct a 2-DE experiment.
2. In protocol section learn the minute integrity required to perform the experiment by going
through the standardized protocol defined for each of the steps. ,
3. Practice; operate through each step of the experiment in simulator section to visualize the entire
process.
BASIC THEORY
The principle governing this approach is that if two proteins share a high sequence similarity,
they are more likely to have very similar three-dimensional structures. If one of the protein
sequences has a known structure, then this structure can be superimposed onto the unknown
protein with a high degree of confidence. Protein sequences are more conserved than DNA and
hence attribute to greater evolutionary significance.
While homology modeling predicts the positions of alpha carbons with moderate accuracy, it
is not quite reliable in predicting side chains and loops. The others approaches are threading
and ab- initio prediction.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
Types of modeling:
• Basic Modeling - Modeling using a template with very high similarity with the target sequence.
• Advanced Modeling - In this case, the target is modeled using more than one template such
that regions of the template proteins that share a high identity with portions of the target are
used individually to model these sections.
• The database searching method - this involves finding loops from known protein structures and
superimposing them onto the two stem regions (main chains mostly) of the target protein. Some
specialized programs like FREAD and CODA can be used.
•
• The ab initio method - this generates many random loops and searches for one that has
reasonably low energy and φ and ψ angles in the allowable regions in the Ramachandran plot.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
The final model has to be evaluated for checking the φ–ψ angles, chirality, bond lengths, close
contacts and also the stereo chemical properties. Various online protein validation software
packages are available such as Procheck, WHATIF, ANOLEA, Verify3D, PROSA.
• Various Comprehensive Modeling Programs are available like Modeller, SWISS MODEL,
Schrodinger, 3D- JIGSAW.
• A successful model depends on template selection, algorithm used and the validation of the
model.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
PROCEDURE
Materials Required:
System and Internet access
• Software: MODELLER9v8
For this experiment, we have used MODELLER9v8 which is a Homology Modeling sofware.
All the input files for running the following software are available here. The Software accepts
the sequences with PIR format.
1) The above FASTA sequence is converted into PIR format to make it readable by the
software. This file must have the extension of .ali format.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
2) After the .ali file is created, “build_profile.py” is searched for in the available input files.
The “build_profile.py” file is obtained and the necessary changes are made.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
3) The script file is now run using the command interpreter as shown below:
The most important columns in the profile.build() output are the II, XI and XII columns. The
II column reports the code of the PDB sequence that was compared with the target sequence.
The XI column reports the percentage sequence identities between betalactoglobulin and a
PDB sequence normalized by the lengths of the alignment. In general, a sequence identity value
above approximately 35% indicates a potential template. A better measure of the significance
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
of the alignment is given in the XII column by the e-value of the alignment. To select the most
appropriate template for our query sequence over the six similar structures, we will use the
alignment.compare_structures() command to assess the structural and sequence similarity
between the possible templates (file "betalactoglobulin_compare.py").
4). All the structural files for PDB database whose E value is 0.0 are obtained from the above
obtained .prf file and they are listed in the compare.py file for the identification of thye best
template.The Input file “alignment.compare() “ is selected and all the templates are added.
The selected PDB files are mentioned in the compare.py file as one of these could be the
probable template.
5). Now, the script file is run using the command interpreter as shown below.
6). After running the compare file, the following dendogram is displayed. The templates are
compared with each other. This comparison gives information about the crystallographic
resolution of the pdb structure. The structure with a better crystallographic R-factor and higher
overall sequence identity to the query sequence is selected to be aligned with the target
sequence. The comparison shown above is for both sequential and structural similarity. The
template with crystallographic R-factor should be in the range (1.5 to 2.5) and the sequence
similarity should be as high as possible.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
For example, the below shown situation 1bdma has an Rfactor @1.8 and sequence similarity
of >40%, therefore it can be considered a good template. When there are two templates with
optimal required parameters then both should be selected. The compare.log file is chosen to
build the structure of the unknown protein and prepare the align2d.py file.
7). The above selected template is the template according to which the model query will be
built. The align2d.py file is modified as shown below. Appending of files i.e. original .ali file
and the selected template will produce two alignment files .PIR and .PAP. After running the
aligned file, an alignment file is created which is used for modeling.
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
8). Now the script file is run using the command interpreter as shown below:
The following file is the alignment file produced by the software. In this file, the query
sequence is globally aligned with the template and the '*' indicates the perfect match in the
sequence. If the alignment results are not satisfactory, we can go back to the template selection
step and replace the template with another protein with comparable similarity.
The number of modes to be generated here must be mentioned, if the multiple models are
desired at the end of the session.
The Script file is run using the command interpreter as shown below:
10). The output file has a statistical score calculated called the DOPE (Discrete Optimized
Protein Energy) score and GA341 score. After the models are created, each of their DOPE
Scores and GA341 Score are observed to that rank the models on the basis of their energy
levels. Models with the lowest DOPE assessment score, or with the highest GA341 assessment
score have the most stable minimized energy, and therefore are picked from the rest of the
models for further protein structure analysis. Range of the GA341 scores are 0.0 (worst) to 1.0
(native-like) but DOPE score values are more reliable in distinguishing good models from the
bad ones. In the example shown they have selected the B99990001.pdb and B99990002.pdb
as the optimal models with the most stable structure and the least energy.
The models created contain the co-ordinates of the model in .pdb format. These models can be
viewed by any molecular visualization software able to read .pdb files such as Rasmol,
Chimera, Pymol, etc.
11).Once the models are generated, they must be evaluated for any missing residues, side chain
and also to check for steric slashes. The model evaluation can be carried out by the software
itself or even with web based tool like ProCheck, ProSA, Verify3D.
For this example we have validated the model using the MODELLER software. We need to
obtain the model "evaluate_model.py" and mention the name of the pdb file to be evaluated.
We have mentioned the file name "betalactoglobulin.B99990002.pdb"
12). Now, the script file is run using the command interpreter as shown below:
BIOINFORMATICS LAB BTech 6th Sem Dr. R. K. Pradhan, OUTR Bhubaneswar
13). This profile is written to a file "betalactoglobulin.profile", which can be used as input to
a graphing program. For example, the plot shown below, it is plotted with GNUPLOT.
Alternatively, the plot_profiles.py script can be used which is included in the tutorial zip file
to plot profiles with the Python matplotlib package.
The above graph displays the difference in the DOPE score per residue of the template with
the model generated. This profile is written to a file "betalactoglobulin.profile", which can be
used as input to a graphing program. For example, the plot shown below, it is plotted
with GNUPLOT. By analyzing the above graph, we can see that the template residues which
are in the range of 50 - 80 and positions 240 - 270 are most conserved with the new generated
model. Further, some literature studies on the proteins also revealed one of the regions to be
active site of the protein. The variable regions of the model can be evaluated by more
comprehensive programs like Advance Modeling, Schrodinger, Robetta etc.