0% found this document useful (0 votes)

5 views12 pages

Secondary Structure Prediction

The document discusses various computational techniques for predicting the secondary structure of proteins, focusing on methods like Chou-Fasman and GOR, which categorize amino acids based on their propensity to form helices or sheets. It also explores advanced approaches such as neural networks, Hidden Markov Models, and Profile HMMs, which utilize statistical and biophysical principles to enhance prediction accuracy. The accuracy of modern methods exceeds 70%, significantly improving upon earlier techniques.

Uploaded by

mavi260900

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views12 pages

Secondary Structure Prediction

Uploaded by

mavi260900

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

NAME : MAHALAKSHMI S

REG NO : 810022509005
COURSE : M. Tech (BIOTECHNOLOGY)
SUB NAME : COMPUTATIONAL BIOLOGY
SUB CODE : BY4202
TITLE : SECONDARY STRUCTURE PREDICTION
Secondary Structure Prediction

When a novel protein is the topic of interest and it’s structure is unknown, a solid method
for predicting its secondary (and eventually tertiary) structure is desired. There are a
variety of computational techniques employed in making secondary structure predictions
for a particular protein sequence, and all work with the goal of differentiating between
helix (H), strand (S), or coil/loop regions (C). Though these three classifica- tions are a
simplification of the terminology used in presented solved protein structures, they provide
enough information to characterize the general structure of a protein. Sequence alignment
is an important tool in secondary structure prediction as highly conserved regions of
related sequences generally correlate to specific secondary structure elements that are
necessary for proper protein function. In this light, it is important to consider that many
methods which use sequence alignment as an initial step in developing models for prediction
are observing trends in sequence conservation (which residues can be conservatively
substituted for each other, etc.). Such intuition is a strong first step in developing a solid
prediction model that fits the utilized test data. Early secondary structure prediction
methods (such as Chou-Fasman and GOR, out- lined below) had a 3-state accuracy of 50–
60%. (They initially reported higher accuracy, but this was found to be inflated once they
were tested against proteins outside of the training set.) Today’s methods have an
accuracy of > 70%.

The Chou-Fasman method

If you were asked to determine whether an amino acid in a protein of interest is part of a
α-helix or β-sheet, you might think to look in a protein database and see which
secondary structures amino acids in similar contexts belonged to. The Chou-Fasman
method (1978) is a combination of such statistics-based methods and rule-based
methods. Here are the steps of the Chou-Fasman algorithm:

1. Calculate propensities from a set of solved structures. For all 20 amino acids i,
calculate these propensities by:

That is, we determine the probability that amino acid i is in each structure, normalized by
the background probability that i occurs at all. For example, let’s say that there are 20,000
amino acids in the database, of which 2000 are serine, and there are 5000 amino acids in
helical conformation, of which 500 are
Then the helical propensity for serine is 0.1. The propensities defined as:

And that these two formulations are equal.

2. Once the propensities are calculated, each amino acid is categorized using the
propensities as one of: helix-former, helix-breaker, or helix-indifferent. (That is,
helix-formers have high helical propensities, helix-breakers have low helical
propensities, and helix-indifferents have intermediate propensities.) Each amino
acid is also categorized as one of: sheet-former, sheet-breaker, or sheet- indifferent.
For example, it was found (as expected) that glycine and prolines are helix-
breakers.
3. When a sequence is input, find nucleation sites. These are short subsequences with
a high-concentration of helix-formers (or sheet-formers). These sites are found with
some heuristic rule (e.g. “a sequence of 6 amino acids with at least 4 helix-formers,
and no helix-breakers”).
4. Extend the nucleation sites, adding residues at the ends, maintaining an average
propensity greater than some threshold.
5. Step 4 may create overlaps; finally, we deal with these overlaps using some
heuristic rules.

Figure: In the Chou-Fasman method, nucleation sites are found along the protein using a
heuristic rule, and then extended.

The GOR method

To determine the structure for a given amino acid position j, the GOR method
(named for the authors Garnier, Osguthorpe, Robson) looks at a window of 8 amino acids
before and 8 after the position of interest. Suppose aj is the amino acid that we are
trying to categorize. GOR looks at the residues aj−8aj−7 . . . aj . . . aj+7aj+8. Intuitively, it
assigns a structure based on probabilities it has calculated from protein databases. These
probabilities are of the form

P r[amino acid j is α|aj−8, . . . , aj , . . . , aj+8 ]

P r[amino acid j is β|aj−8, . . . , aj , . . . , aj+8 ]

That is, each corresponds to the probability that an amino acid has a particular structure,
given the sequence around it. The GOR method thus looks at a window of 17 amino
acids:

8 8
... ...
j
There are far too many possible sequences of length 17 to make calculating the above
probabilities feasible. Instead, it is assumed that these probabilities can be estimated using
just pairwise probabilities. We omit details but the overall idea is similar to the log-odds
ratios we have studied before, except that pairwise dependencies are considered.
There are two main schools of thought on secondary structure prediction: statistical analysis and
reliance on biophysical principles in order to predict sequence. These methods will be explored below.
Probability Based Methods

Single Amino Acid Based Methods

In single amino acid based methods, 0 th order Markov Model is used, were each amino acid is looked at by
itself in terms of its position as either in a helix, strand, or coil. Once these predictions are made, a
”clean up” algorithmic step is used, to go back over predicted sequences to unify the data into a single
prediction for a sequence. Here is a simple example of how this may work:
If the secondary structure is predicted as follows (structure prediction in bold):

Then the alignment will be cleaned up to its most likely structural position, which in this case
would be entirely a helix:

This type of correction is needed to make the predictions logical. Proteins contain conserved
domains,with each domain consisting of a certain length of amino acids, sometimes as short as three or
four, with others extending to 30, 40, or more. It is impossible to come up with the Markov transition
matrix that will always work correctly to extend domains in the order model, which makes taking a
consensus over a region necessary to make biologically relevant predictions, as shown in the above
example. Using models of this type are limiting, which lead to the development of more involved
models now presented.

Window Based Methods

Here, higher order Markov models are used to calculate the probability of secondary structure in an
amino acid sequence. A window is used to examine amino acids upstream and downstream in the se-
quence in calculating a probability for secondary structure type of the current residue being examined.
For example, for a order model,a 5-tuple is used, looking at the two residues upstream and the two
residues downstream of the current residue. Then a probability is assigned to the region telling whether it
is a helix, strand or a coil. In this manner less clean up is necessary, as regions are assigned structure
type based on the highest scoring probabilities for groups, rather than individual residues. Here then the
main problem is going back and solving for transition between regions, which can become slightly
ambiguous in this method, as well as looking of small loop regions which may be missed, especially if
they are only four or five residues long and consist of very flexible residues that may fit into a longer
predicted sequence. For example, a sequence predicted as:

but the loop region is lost because of too large a window size. This is the largest problem with the
window based method, and can be improved by using multiple window sizes in conjunction with each
other and developing a heuristic to find a consensus structural prediction for a given stretch of
sequence. As with most structural prediction methods, developing a biologically sound scoring system
is crucial to algorithmic success.

Biophysical Principles
Beyond simply looking at which residues follow which others in structures, it has also become
important to examine biophysical principles in structure prediction. From electrostatic interactions to
simply geometric observations (i.e., you simple can’t place multiple prolines in a row in a helix, as they
cannot turn fast enough), there are many ways that algorithms can be improved using these principles
in predicting protein structure in addition to simple sequence information.

Nearest Neighbor Methods

The nearest neighbor method looks at each n-tuple and maps it to a labelled point in for a given training
data set. The idea behind the approach is to to predict secondary structure of the center residue in an n-
tuple window based on the known secondary structure of homologous protein segments of proteins
with already characterized tertiary structure. This method was pioneered by Yi and Lander (1993) and
Rost and Sander (1994), and later improved by Salamov and Solovyev (1995)[6]. To deal with
ambiguity in prediction, this method looks at the nearest k neighbors from the test data, and takes the
consensus of the k neighbors found. Combining information from a variety of window sizes, usually in
the 4-6tuple range in order to come up with a more selective and sensitive selection method. The
important trick to determining the accuracy of these methods is how the scoring system is
implemented.

As mentioned before, one way to improve the development of scoring matrices is to use multiple
sequence alignment data in training rather than just single sequence data, demonstrated by Salamov and
Solovyev (1995), who improved previous accuracy of 68 percent to 72 percent.
Neural Networks
A newer method used for predicting secondary structure is the use of neural networks. These networks
consist of multiple levels, comprised of nodes. Each node has a variety of inputs, anywhere from 1 to n,
and 1 output. The majority of neural networks used in biological sequence analysis, both DNA and
amino acid alike, consist of an input layer, a middle ”hidden layer,” and an output layer. This neural
network architecture is depicted in the below figure.

Node
i

Fig: A Single Node

Fig: A Neural Netwo

As you can see, many nodes are combined into a network, with each edge leading to or
from a node is given a certain statistical weight for interpretation by its destination node.
These weights are determined from training data that is fed through the neural network
algorithm, which consists of sets of protein sequence data whose structure is already
known. Each node, no matter which layer it is included in, will return an output value.
Depending on the weight of each edge, these values will be interpreted differently. These
values are calculated from the following equation:

Perceptrons, also called threshold units, are a simple method for classifying
input vectors, or examples, into one of two categories. They function similarly
to one-layer neural networks in fact, we will see that full neural networks are
essentially built from many threshold units.
If the weights are not known in advance, the perceptron must be trained. For
that we need a training set: a set S of input vectors for which we know the
desired (target) answer. Ideally, the goal of training is to find a set of weights
such that the perceptron returns the correct answer for all of the training
examples, with the hope that such a perceptron will have good performance on
examples it has never seen. The training set should contain both positive and
negative examples. For example, if we were to build a perceptron to recognize
α-helices, then we should have sequences that are part of α-helices (positive
examples), as well as sequences that are not (negative examples).

Hidden Markov Models

This is another new method used for secondary structure prediction which is implemented
in a similar fashion to gene prediction using Hidden Markov Models. The intuition is to
use multiple sequence alignment data using sequences whose secondary structure are
already known to develop an HMM that will predict secondary structure based on
probabilities determined from the training alignment. Then an unknown sequence can
then be fed through the HMM to get a prediction of the secondary sequence alignment
HMMs can also be used in a more complex application in secondary structure prediction,
similar to the idea of threading. To predict the secondary structure state for each position in
a sequence, the state-specific probability distribution is multiplied by the position-specific
state probability over all possible states. The predicted secondary structure (helix, strand,
or coil) is the one with the highest value in the summed distribution. The model works by
analyzing the input sequence profile from both a sequence perspective (much like
Markov Models discussed above), while also looked at probabilities of sequences fitting
particular structural motifs (a simplified version of threading). In doing so, a more
accurate model sensitive model is created for prediction the correct state of the sequence.
The equation used for this calculation using a two tiered voting scheme as follows:
Profile HMM
Using consensus secondary structure information taken from a particular family of proteins
(ty- rosine kinases, for example), a Profile HMM can be created which contains
probabilities for secondary structure based on the consensus sequence data (see Lecture 15
notes for a further ex- planation of profile HMM). In this manner, new proteins thought to
belong to a particular family via primary sequence alignment (easily and accurately
calculated, using blastp or similar tool) can be run through a Profile HMM which will then
assign a set of predicted secondary structure elements to the primary sequence based on
the consensus of its family members. This provides a very accurate prediction of
secondary structure, but is limited in usefulness to analyzing protein sequences of proteins
that can be characterized as part of a well understood and documented protein family.
The CATH system is used to measure the accuracy of secondary structure prediction. (See
lecture 19 notes for a detailed discussion of CATH). Secondary structure of novel proteins
cannot be classified in this manner, and running them through a Profile HMM of a
sequence that it is unrelated to will result in what is likely to be highly incorrect secondary
structure prediction.
References

[1] Bystroff C, Thorsson V, and Baker D. (2000) HMMSTR: a hidden Markov model
for local sequence- structure correlations in proteins. Journal of Molecular Biology,
301, 173-190.
[2] Durbin R., Eddy. S, Krogh A. and Mitchison G. Biological sequence analysis United
Kingdom, Cam- bridge University Press, 2002
See Chapter 6 for information on multiple sequence alginments, and Chapter 7 for
information for information of dealing with complex phylogentic relationships. A
review the Feng-Doolittle alignment can also be found on pages 145-146.
[3] Jones DT. Protein Secondary Structure Prediction Based on Position-specific Scoring
Matrices. Journal of Molecular Biology, 292, 195-202.
[4] Koehl P. (2001) Protein structure similarities. Current Opinions in Structural Biology,
11, 348-353.
[5] Presnell SR, Cohen BI, and Cohen FE. (1992) A Segment-based Approach to Protein
Secondary Struc- ture Prediction. Biochemistry, 31, 983-993.
[6] Salamov AA, Solovyev VV. (1995) Prediction of Protein Secondary Structure by
Combining Nearest- neighbor Algorithms and Multiple Sequence Alignments. J.
Mol. Biol., 247, 11-15.
[7] Sternberg M. Protein Structure Prediction, IRL Press, 1996.

Bif401 Solved Final Papers 2017
100% (1)
Bif401 Solved Final Papers 2017
8 pages
Icc PDF
100% (1)
Icc PDF
279 pages
Protein Supersecondary Structures: Methods and Protocols
No ratings yet
Protein Supersecondary Structures: Methods and Protocols
443 pages
Bif 401 100% Solved Final Term Paper by Sulman Ali
No ratings yet
Bif 401 100% Solved Final Term Paper by Sulman Ali
5 pages
3.7 Protein Structure Prediction and Classification
No ratings yet
3.7 Protein Structure Prediction and Classification
20 pages
Generation of 3D Structure of Protein
No ratings yet
Generation of 3D Structure of Protein
11 pages
Study of Tig Welding
100% (1)
Study of Tig Welding
11 pages
Secondary Structure Prediction
No ratings yet
Secondary Structure Prediction
43 pages
Protein Structure Prediction Thesis
100% (3)
Protein Structure Prediction Thesis
8 pages
Brosur Grolen HP19R
No ratings yet
Brosur Grolen HP19R
2 pages
Second Done w12 13 Protein Structure and Fold Prediction
No ratings yet
Second Done w12 13 Protein Structure and Fold Prediction
62 pages
Collaborative Learning
No ratings yet
Collaborative Learning
7 pages
Bioinformatics Notes - 17Bt54: Module - 4
No ratings yet
Bioinformatics Notes - 17Bt54: Module - 4
48 pages
10 1093@protein@gzg101
No ratings yet
10 1093@protein@gzg101
9 pages
Advanced Chemistryprize2024
No ratings yet
Advanced Chemistryprize2024
17 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
13 pages
Part1 Overview Release 13 en
No ratings yet
Part1 Overview Release 13 en
38 pages
Structural Bioinformatics
No ratings yet
Structural Bioinformatics
37 pages
Computational - Chapter 2 (Questions With Answers)
No ratings yet
Computational - Chapter 2 (Questions With Answers)
8 pages
PSIPRED
No ratings yet
PSIPRED
8 pages
Improving Protein Tertiary Structure Prediction With Conformational Propensities of Amino Acid Residues
No ratings yet
Improving Protein Tertiary Structure Prediction With Conformational Propensities of Amino Acid Residues
7 pages
VOCALOID 6 Reference Manual ENG
No ratings yet
VOCALOID 6 Reference Manual ENG
88 pages
Chou Fasman Server 2013
No ratings yet
Chou Fasman Server 2013
6 pages
Protein Structure Prediction (Help File)
No ratings yet
Protein Structure Prediction (Help File)
39 pages
Module 5 Notes
No ratings yet
Module 5 Notes
151 pages
2d 3d Structure
No ratings yet
2d 3d Structure
38 pages
Protein Side Chain Correction
No ratings yet
Protein Side Chain Correction
28 pages
Employee Retention, Engagement and Careers
No ratings yet
Employee Retention, Engagement and Careers
16 pages
Prediction of Protein Tertiary Structural Classes Based On Ensemble Learning
No ratings yet
Prediction of Protein Tertiary Structural Classes Based On Ensemble Learning
4 pages
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures From Protein Sequences
No ratings yet
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures From Protein Sequences
21 pages
Unit 8 - TQM
No ratings yet
Unit 8 - TQM
37 pages
Prediction of Protein Secondary Structure With A Reliability Score Estimated by Local Sequence Clustering
No ratings yet
Prediction of Protein Secondary Structure With A Reliability Score Estimated by Local Sequence Clustering
7 pages
Lecture 5' - Introduction To Protein Struct II Spr08
No ratings yet
Lecture 5' - Introduction To Protein Struct II Spr08
29 pages
Unit - 1 - Ohs352-Project Report Writing
No ratings yet
Unit - 1 - Ohs352-Project Report Writing
23 pages
1 Secondary Structure Prediction
No ratings yet
1 Secondary Structure Prediction
15 pages
Chou Fasman Help Paper
No ratings yet
Chou Fasman Help Paper
2 pages
Lecture 12 - Protein Structure
No ratings yet
Lecture 12 - Protein Structure
24 pages
Position-Specific Propensities of Amino Acids in The B-Strand
No ratings yet
Position-Specific Propensities of Amino Acids in The B-Strand
10 pages
Ieee Review
No ratings yet
Ieee Review
9 pages
T-HA Series: Panasonic Industrial Company
No ratings yet
T-HA Series: Panasonic Industrial Company
6 pages
Review Quiz - Attempt Review2
No ratings yet
Review Quiz - Attempt Review2
11 pages
FALLSEM2024-25 BBIT202L TH VL2024250104080 2024-10-25 Reference-Material-I
No ratings yet
FALLSEM2024-25 BBIT202L TH VL2024250104080 2024-10-25 Reference-Material-I
24 pages
Project Documentation 2023 - 24 TK
No ratings yet
Project Documentation 2023 - 24 TK
18 pages
Data Sheet: SFH757 and SFH757V
No ratings yet
Data Sheet: SFH757 and SFH757V
4 pages
Bioinformatics: Protein Structure Prediction: Chandrayani N.Rokde DR - Manali Kshirsagar
No ratings yet
Bioinformatics: Protein Structure Prediction: Chandrayani N.Rokde DR - Manali Kshirsagar
5 pages
The Energy Transition Conference 2023 - Delegates Brochure
No ratings yet
The Energy Transition Conference 2023 - Delegates Brochure
25 pages
Protein Threading
No ratings yet
Protein Threading
9 pages
Mosi Debat
No ratings yet
Mosi Debat
8 pages
X4751 enUS 4751 CementIndustryBrochure 010920
No ratings yet
X4751 enUS 4751 CementIndustryBrochure 010920
12 pages
Protein Contact Prediction From Amino Acid Co-Evolution Using Convolutional Networks For Graph-Valued Images
No ratings yet
Protein Contact Prediction From Amino Acid Co-Evolution Using Convolutional Networks For Graph-Valued Images
9 pages
EAPP Module2 v2
No ratings yet
EAPP Module2 v2
7 pages
Protein Modelling
No ratings yet
Protein Modelling
15 pages
98 - Improving Rutting Resistance Using Geosynthetics
No ratings yet
98 - Improving Rutting Resistance Using Geosynthetics
5 pages
Tertiary Structure Prediction Methods: Any Given Protein Sequence
No ratings yet
Tertiary Structure Prediction Methods: Any Given Protein Sequence
29 pages
X Class Sample 3
No ratings yet
X Class Sample 3
9 pages
GKL 789
No ratings yet
GKL 789
10 pages
Ab Initio
No ratings yet
Ab Initio
9 pages
Bio Chap Notes
No ratings yet
Bio Chap Notes
26 pages
AutoCAD Civil 3D 2012 Essentials p2
No ratings yet
AutoCAD Civil 3D 2012 Essentials p2
55 pages
Tire Dimensions
No ratings yet
Tire Dimensions
1 page
Date Reference Description Valuedate Deposit Withdrawal Balance
No ratings yet
Date Reference Description Valuedate Deposit Withdrawal Balance
26 pages
New Vendor Form
No ratings yet
New Vendor Form
1 page
Proteomics & Genomics
No ratings yet
Proteomics & Genomics
11 pages
Bookchapter Proteinstructure
No ratings yet
Bookchapter Proteinstructure
16 pages
Bioinfo - S1 2021 - L9 - Protein Structure - 1 Slide
No ratings yet
Bioinfo - S1 2021 - L9 - Protein Structure - 1 Slide
87 pages
10 of The Most Luxurious Indian Homes On Houzz
No ratings yet
10 of The Most Luxurious Indian Homes On Houzz
2 pages
Protein Structure Prediction: Faruk Berat Akcesme
No ratings yet
Protein Structure Prediction: Faruk Berat Akcesme
44 pages
Shamjith UiUx Design Resume
No ratings yet
Shamjith UiUx Design Resume
1 page
Structural Biology: What Does 3D Tell Us?
No ratings yet
Structural Biology: What Does 3D Tell Us?
20 pages
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
SLG - Sequence of Operation
No ratings yet
SLG - Sequence of Operation
1 page
Business Case Studies
No ratings yet
Business Case Studies
10 pages
Protein Sructure Prediction Using Phyre - Kelly & Sternberg 2009
No ratings yet
Protein Sructure Prediction Using Phyre - Kelly & Sternberg 2009
9 pages
Chau Fasman Using MATLAB
No ratings yet
Chau Fasman Using MATLAB
4 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
Tramontano A. - Protein Structure Prediction 2007 - t1v3
No ratings yet
Tramontano A. - Protein Structure Prediction 2007 - t1v3
46 pages
Gene Pridiction and Orf
No ratings yet
Gene Pridiction and Orf
34 pages
Protein Modeling: Protein Structure Prediction Other Topics
No ratings yet
Protein Modeling: Protein Structure Prediction Other Topics
76 pages
Pre-Assessment Questions
No ratings yet
Pre-Assessment Questions
18 pages
SSRN 4541252
No ratings yet
SSRN 4541252
25 pages
Structural Bioinformatics
No ratings yet
Structural Bioinformatics
23 pages
Genome Sequencing Projects: Increase in The Number of Protein Sequences
No ratings yet
Genome Sequencing Projects: Increase in The Number of Protein Sequences
27 pages
Footscan®v9 Software Packages
No ratings yet
Footscan®v9 Software Packages
1 page
Protein Prediction
No ratings yet
Protein Prediction
100 pages
Biology Sem Exam Suggestion
No ratings yet
Biology Sem Exam Suggestion
16 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
17 pages
Protein Tertiary Structures: Prediction From Amino Acid Sequences
No ratings yet
Protein Tertiary Structures: Prediction From Amino Acid Sequences
7 pages
Protein Structure Modelling
No ratings yet
Protein Structure Modelling
3 pages
PGP Aiml2024
No ratings yet
PGP Aiml2024
22 pages
Interfacing of LED 8051
No ratings yet
Interfacing of LED 8051
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Secondary Structure Prediction

Uploaded by

Secondary Structure Prediction

Uploaded by

NAME : MAHALAKSHMI S

The Chou-Fasman method

And that these two formulations are equal.

The GOR method

P r[amino acid j is α|aj−8, . . . , aj , . . . , aj+8 ]

P r[amino acid j is β|aj−8, . . . , aj , . . . , aj+8 ]

Single Amino Acid Based Methods

Window Based Methods

Nearest Neighbor Methods

Fig: A Single Node

Fig: A Neural Netwo

Hidden Markov Models

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.