Identification of Modelling Templates
Identification of Modelling Templates
Insights into the three-dimensional (3D) structure of a protein are of great assistance when
planning experiments aimed at the understanding of protein function and during the drug
design process. The experimental elucidation of the 3D-structure of proteins is however
often hampered by difficulties in obtaining sufficient protein, diffracting crystals and many
other technical aspects. Therefore the number of solved 3D-structures increases only
slowly compared to the rate of sequencing of novel cDNAs, and no structural information
is available for the vast majority of the protein sequences registered in the SWISS-PROT
database (nearly 60'000 entries in release 34). In this context it is not surprising that
predictive methods have gained much interest.
Proteins from different sources and sometimes-diverse biological functions can
have similar sequences, and it is generally accepted that high sequence similarity is
reflected by distinct structure similarity. Indeed, the relative mean square deviation (rmsd)
of the alpha-carbon co-ordinates for protein cores sharing 50% residue identity is expected
to be around 1Å. This fact served as the premise for the development of comparative
protein modelling (also often called modelling by homology or knowledge-based
modelling), which is presently the most reliable method. Comparative model building
consists of the extrapolation of the structure for a new (target) sequence from the known
3D-structure of related family members (templates).
While the high precision structures required for detailed studies of protein-ligand
interaction can only be obtained experimentally, theoretical protein modelling provides the
molecular biologists with "low-resolution" models which hold enough essential
information about the spatial arrangement of important residues to guide the design of
experiments. The rational design of many site-directed mutagenesis experiments could
therefore be improved if more of these "low-resolution" theoretical model structures were
available.
The above procedure might allow the selection of several suitable templates for a
given target sequence, and up to ten templates are used in the modelling process. The best
template structure - the one with the highest sequence similarity to the target - will serve
as the reference. All the other selected templates will be superimposed onto it in 3D. The
3D match is carried out by superimposing corresponding Ca atom pairs selected
automatically from the highest scoring local sequence alignment determined by SIM. This
superposition can then be optimised by maximising the number of Ca pairs in the common
core while minimising their relative mean square deviation. Each residue of the reference
structure is then aligned with a residue from every other available template structure if their
Ca atoms are located within 3.0 Å. This generates a structurally corrected multiple
sequence alignment.
The target sequence now needs to be aligned with the template sequence or, if
several templates were selected, with the structurally corrected multiple sequence
alignment. This can be achieved by using the best-scoring diagonals obtained by SIM.
Residues which should not be used for model building, for example those located in non-
conserved loops, will be ignored during the modelling process. Thus, the common core of
the target protein and the loops completely defined by at least one supplied template
structure will be built.
Since the loop building only adds Ca atoms, the backbone carbonyl and nitrogens
must be completed in these regions. This step can be performed by using a library of
pentapeptide backbone fragments derived from the PDB entries determined with a
resolution better than 2.0 Å. These fragments are then fitted to overlapping runs of five Ca
atoms of the target model. The co-ordinates of each central tripeptide are then averaged for
each target backbone atom (N, C, O) and added to the model. This process yields modelled
backbones that differ from experimental co-ordinates by approx. 0.2 Å rms.
For many of the protein side chains there is no structural information available in
the templates. These cannot therefore be built during the framework generation and must
be added later. The number of side chains that need to be built is dictated by the degree of
sequence identity between target and template sequences. To this end one uses a table of
the most probable rotamers for each amino acid side chain depending on their backbone
conformation. All the allowed rotamers of the residues missing from the structure are
analysed to see if they are acceptable by a van der Waals exclusion test. The most favoured
rotamer is added to the model. The atoms defining the c1 and c2 angles of incomplete side
chains can be used to restrict the choice of rotamers to those fitting these angles. If some
side chains cannot be rebuilt in a first attempt, they will be assigned initially in a second
pass. This allows some side chains to be rebuilt even if the most probable allowed rotamer
of a neighbouring residue already occupies some of this portion of space. The latter may
then switch to a less probable but allowed rotamer. In case that not all of the side chains
can be added, an additional tolerance of 0.15 Å can be introduced in the van der Waals
exclusion test and the procedure repeated.
Model refinement
MOLECULAR MODELLING
Methods.
Amino acid sequence of this protein was found from www.pubmed.com and then
submitted to geno3D. Geno3D returned seven templates, out of these seven template two
templates with 62.6 % identity, were selected to modelling. Result of modelling obtained
via e-mail,which was analyzed further. Details of which are given underneath.
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK
SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP
PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG
GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
Model 1 was selected due to owing lowest energy and having highest number of amino
acids in core region. Wire frame model (Fig 2) and secondary structure showing model
with residue (Fig 3) was obtained using Rasmol software.
Fig 2: Wire frame structure of model-1, generated by Rasmol Software
Fig 3: Ribbon model of model-1, generated by Rasmol Software