0% found this document useful (0 votes)
11 views31 pages

Lecture For HND Homology Modelling

Homology modeling, also known as comparative or template-based modeling, predicts protein structures based on known homologous sequences. The process involves several steps including template identification, alignment correction, backbone generation, and model validation, with accuracy heavily dependent on the sequence identity between the target and template. This method is particularly useful for proteins lacking experimental structure data, allowing insights into their function and activity.

Uploaded by

maliknoman0365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views31 pages

Lecture For HND Homology Modelling

Homology modeling, also known as comparative or template-based modeling, predicts protein structures based on known homologous sequences. The process involves several steps including template identification, alignment correction, backbone generation, and model validation, with accuracy heavily dependent on the sequence identity between the target and template. This method is particularly useful for proteins lacking experimental structure data, allowing insights into their function and activity.

Uploaded by

maliknoman0365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

HOMOLOGY MODELING

Lecture by
Sidra Hassan
• “Homology Modeling"

also called comparative modeling or sometimes template-based


modeling (TBM),
• Based on the observation that “Similar sequences exhibit
similar structures”

• Known structure is used as a template to model an unknown


(but likely similar) structure with known sequence

• Homology modeling is an in silico method that predicts the tertiary


structure of an amino acid sequence based on a homologous
experimentally determined structure.

• Certain proteins with as low as 25% similarity have
been observed to assume same 3D structure

• Offers a method to “Predict” the 3D structure of


proteins for which it is not possible to obtain X-ray or
NMR data

• Can be used in understanding function, activity,


specificity, etc.
Steps required in homology modeling

• Template Identification
• Alignment Correction
• Backbone Generation
• Loop Modeling
• Side-Chain Modeling
• Model Optimization
• Model Validation
Step 1:
Template Recognition and Initial Alignment
• The percentage identity between the sequence of interest
and a possible template is high enough to be detected with
simple sequence alignment programs such as BLAST or FASTA.

• To identify these hits, the program compares the query


sequence to all the sequences.

• First we search the related proteins sequence(templates) to


the target sequence in any structural database of proteins
• The accuracy of model depends on the selection of proper
template

• FASTA and BLAST from EMBL-EBI and NCBI can be used

• This gives a probable set of templates but the final one is not
yet decided

• After intial aligments and finding structurally conserved


regions among templates, we choose the final template
Step 2: Alignment in Homology Modeling
Sequence alignment is central technique in homology modeling

 Used in determining which areas of the reference proteins are


conserved in sequence

 Hence suggesting where the reference proteins may also be


structurally conserved

 After SCRs are found,it is used to establish one to one


correspondence between the amino acids of reference
proteins and the target in SCRs
• Thus providing basis of the transforming of coordinates from
the reference to the model

• Small error in alignment can lead to big error in structural


model

• Multiple alignments are usually better than pairwise


alignments
Local Alignment
 Comparing sequences of different length

 Proteins are from different protein families

Tools based on local alignment


 BLAST & FASTA – alignment against databases

 LALIGN & EMBOSS align – alignment of two sequences

 Infact there are more tools, these are the widely used
Multiple Sequence Alignment
 This is all about pairwise alignment

 Sometimes it may be difficult to align two sequences in a region


where the percentage sequence identity is very low. One can
then use other sequences from homologous proteins to find a
solution.

 In general homology modeling, we would like to include more


than two protein references for the template protein
 It helps in finding conserved domains among similar reference
proteins

 Therefore providing more information about structurally


conserved domains in sequences

 Multiple alignment is more difficult than pairwise alignment


because the number of possible alignments increases
exponentially with the number of sequences to be aligned

 No ideal method exists, several heuristic algorithms are being


used
Assignment of coordinates within conserved
region
 Once the correspondence between amino acids in the reference and model
sequences has been made, the coordinates for an SCR can be assigned
 The reference proteins' coordinates are used as a basis for this assignment

 Where the side chains of the reference and model proteins are the same at
corresponding locations along the sequence, all the coordinates for the amino
acid are transferred

 Where they differ, the backbone coordinates are transferred , but the side
chain atoms are automatically replaced to preserve the model protein's
residue types
Structurally Conserved Regions (SCR’s)*

• Corresponds to the most stable structures or regions (usually


interior) of protein

• Corresponds to sequence regions with lowest level of


gapping, highest level of sequence conservation

• Usually corresponds to secondary structures


• Establish structural guidelines for the family of proteins under
consideration

• First step in building a model protein by homology is


determining what regions are structurally conserved or
constant among all the reference proteins

• Target protein is supposed to assume the same conformation


in conserved regions

• Usually contains alpha-helices and beta sheets


Structurally Variable Regions (SVR’s)*

• Corresponds to the least stable or most flexible regions


(usually exterior) of protein

• Corresponds to sequence regions with highest level of


gapping, lowest level of sequence conservation

• Usually corresponds to loops and turns


Step 3: Backbone Generation
• Given a template and an alignment, the information contained
therein must be used to generate a three-dimensional structural
model of the target, represented as a set of Cartesian coordinates
for each atom in the protein.

• When the alignment is ready, the actual model building can start.
Creating the backbone is trivial for most of the model:

• One simply copies the coordinates of those template residues that


show up in the alignment with the model sequence . If two aligned
residues differ, only the backbone coordinates (N,Cα,C and O) can
be copied. If they are the same, one can also include the side chain
(at least the more rigid side chains, since rotamers tend to be
conserved).
• Experimentally determined protein structures are not perfect (but
still better than models in most cases). There are countless sources of
errors, ranging from poor electron density in the X-ray diffraction
map to simple human errors when preparing the PDB file for
submission.

• Although in principle multiple template modeling is simple


Step 4: Loop Modeling
• In the majority of cases, the alignment between model and
template sequence contains gaps. Either gaps in the model
sequence (deletions)or in the template sequence (insertions).

• In the first case, one simply omits residues from the template,
creating a hole in the model that must be closed.

• In the second case, one takes the continuous backbone from


the template, cuts it, and inserts the missing residues.
• At least for short loops (up to 5–8 residues), the various
methods have a reasonable chance of predicting a loop
conformation that superimposes well on the true structure.
surface loops tend to change their conformation due to crystal
contacts.

• So if the prediction is made for an isolated protein and then


found to differ from the crystal structure, it might still be
correct.
Step 5: Side-Chain Modeling
• When we compare the side-chain conformations (rotamers) of
residues that are conserved in structurally similar proteins.

• It is therefore possible to simply copy conserved residues entirely


from the template to the model and achieve a higher accuracy than
by copying just the backbone and repredicting the side chains.
• Practically all successful approaches to side-chain placement
are at least partly knowledge based.

• They use libraries of common rotamers extracted from high


resolution X-ray structures.

• the choice of a certain rotamer automatically affects the


rotamers of all neighboring residues, which in turn affect their
neighbors and so on.
Step 6: Model Optimization
Optimisation Approaches

 Energy Minimisation is used to produce a chemically and


conformationally reasonable model protein structure

Two mainly used optimisation algorithms are


 Steepest Descent

 Conjugate Gradients
Step 7: Model Validation
• Every homology model contains errors.
• The number of errors mainly depends on two values:
• 1. The percentage sequence identity between template and
target.
• 2. The number of errors in the template.

• Hence it is essential to check the correctness of overall fold/


structure, errors of localized regions and stereochemical
parameters: bond lengths, angles, geometries
There are two principally different ways to estimate
errors in a structure:

• 1. Calculating the model’s energy based on a force field: This


method checks if the bond lengths and bond angles are within
normal ranges, and if there are lots of bumps in the model
(corresponding to a high Van der Waals energy).

• 2. Determination of normality indices that describe how well


a given characteristic of the model resembles the same
characteristic in real structures.
Structural Comparison Method

• The most common method of comparing two protein


structures uses the root mean square deviation (RMSD)
metric to measure the mean distance between the
corresponding atom in the two structures after they have
been superimposed
Accuracy
• The accuracy of the structures generated by homology modeling
is highly dependent on the sequence identity between target and
template.
• Above 50% sequence identity, models tend to be reliable, with
only minor errors in side chain packing and rotameric state, and
an overall RMSD between the modeled and the experimental
structure falling around 1 Å.
• In the 30–50% identity range, errors can be more severe and are
often located in loops.
• Below 30% identity, serious errors occur, sometimes resulting in
the basic fold being mis-predicted. This low-identity region is
often referred to as the "twilight zone" within which homology
modeling is extremely difficult, and to which it is possibly less
suited than fold recognition methods
Related Links for further Study

https://microbenotes.com/homology-modeling-
working-steps-and-uses
/

https://en.wikipedia.org/wiki/
Homology_modeling#:~:text=Homology
%20modeling%2C%20also%20known%20as,(the
%20%22template%22).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy