0% found this document useful (0 votes)
3 views

Structural bioinformatics

The document provides an overview of structural bioinformatics, focusing on protein structures, their determination methods, and the significance of their shapes in biological functions. It discusses various aspects such as the Protein Data Bank (PDB), visualization techniques, and the interactions that stabilize protein structures. Additionally, it highlights the importance of understanding protein structures for drug design and disease mechanisms.

Uploaded by

wemaxkrevskiy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Structural bioinformatics

The document provides an overview of structural bioinformatics, focusing on protein structures, their determination methods, and the significance of their shapes in biological functions. It discusses various aspects such as the Protein Data Bank (PDB), visualization techniques, and the interactions that stabilize protein structures. Additionally, it highlights the importance of understanding protein structures for drug design and disease mechanisms.

Uploaded by

wemaxkrevskiy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Structural bioinformatics

Dr. Zsuzsanna Dosztányi

dosztanyi@caesar.elte.hu

7 October 2019
Basic features of protein structures
Structure determination methods
PDB database
Visualization structures
Analysis of protein structures
Practical
PDB
Visualization of protein structures
• “It has not escaped our notice that the specific pairing we
have postulated immediately suggests a possible copying
mechanism for the genetic material.”
Wide range of functions of proteins
Amino acid Peptide bond
Carboxyl group Amino acid 1 Amino acid 2

Amino group

Peptide bond

Side chain Condensation (water


Dipeptide
molecule release)

Amino acids

Polypeptide e

chain
tid
ep
yp
ol
P

Amino acid
20 Amino acids
How many different sequences can a
100 amino acid long protein have?

20100 ~10130

Number of protons in the observable


universe is around 1080

Proteins are usually longer


The longest one is around 30 000 AA
Globular proteins

They adopt a well-defined


compact structure
Why are protein structures
interesting?
Function is heavily dependent on the shape
of the protein

Atomic-level understanding of biological processes


(DNA, RNA, enzymes, hormones, receptors)
Understanding the molecular basis of diseases
Drug design, protein-drug interactions
“Protein structure is the three-dimensional
arrangement of atoms in a protein molecule”
Quaternary structure
Myoglobin
Monomer 

Hemoglobin
Multi subunit protein
complexes

– Homo and hetero 
oligomers

 
Interactions stabilizing proteins

Hydrogen bond
Between side chains and Ionic bond
the main chain
Hydrophobic effect
Van der Waals
interaction

Hydrogen bond
Between side chains
Disulphide
bridge
Van der Waals interaction

Between every atom pairs

Attractive and repulsive term

VdW radius
Water molecules
Hydrophobic effect

Hydrophobic effect: dominated by entropic factors


Hydrophobicity
Hydrophobic
residues

Hydrophilic residues

Protein in isolation Protein in aquaneous


environment
H-bond

Hydrogen bonds are formed by a H-atom bound in the structure with a high
electronegativity atom (F, N, O) from a different functional group, i.e. a
hydrogen atom establishes a bond between two other atoms.
H-bonds in biological systems
Electrostatics interactions

- ε a dielectric constant

- characterizes how much the


charges are shielded by the
environment

- ε is 1 in vacuum, around 4 inside a


protein, in water ~ 80
Strength of interactions
Main chain conformation
N-terminal C-terminal

Torsion angle
The main chain φ and ψ
clockwise (+)
counter-clockwise
torsion angles of a protein
(-) cannot take arbitrary values,
there are preferred
conformations.
Rotation around N-
C bond: phi

Rotation around C-


C' bond: psi
Ramachandran plot

● Using the φ, ψ angles we can


evaluate a structure
● Each secondary structure has a
distinct region
● Glycines and prolines are not
represented, they have special
conformational preferences
The (right-handed)  helix
Approx. 30% of globular proteins

5-40 residues in length (10 on


average)

Individual H-bonds are relatively


weak, they have a significant
contribution to helix stability
β sheet conformation
Approx. 30% of globular
proteins

Strands of 5-10 residues run


in parallel

Strands are held together by


H-bonds
Loops and turns
Typically have hydrophilic
characters. Occur on the
outer regions of the protein,
form H-bonds with water and
other molecules

Often form binding regions


and active sites in enzymes
and receptors
Domains: many proteins feature distinct compact
structural units

Src protein kinase


Domains
●Compact units with globular-like structures
●Domains are basic building blocks of proteins
●Typically fulfill a well-specified function
●Can appear in various biological contexts
SH2 domain
How to find a specific structure?
�Database
�Wordwide Protein Data Bank (wwPDB)
�3 entries:
�PDB Europe
�PDB Japan
�RSCB PDB
�Easy to use
�Search by name, ID
�Direct links
�Well maintained
Where does a structure come from?

X-ray crystallography

NMR

Electron microscopy
Where do these structures come from?

Most structures are solved by


x-ray crystallography (86%)
X-ray crystallography
X-ray:

- X-rays have short wave lengths


(approx. 1.5 Å) – needed to
measure the typical atom-atom
distances

- gives information about electron


density, the model has to be fit
into that

- crystallization artefacts

- non-physiological environment

- no information on hydrogens
NMR

- in solution
- usually yields a structural ensemble that fulfills the distance constraints
- only small proteins
- less precise model
- usable for flexible proteins as well
Cryo-EM (atomic resolution)
Nobel Prize in Chemistry 2017
PDB statistics
PDB ID: unique identifier

Each atomic coordinate file in the Protein Data Bank has a unique identifier
composed of exactly 4 characters. The first one is always a number, the rest
can be either a number or a letter.

There are over 400,000 possible 4-digit PDB IDs (419,904 or 466,560 if "0"
can also be the first character).

Examples:
• 1mbn - 1973, the first protein structure model, myoglobin
• 1tna - 1975, the first RNA structure, yeast phenylalanine transfer RNA
• 1bna - 1980, the first B-DNA double helix structure (determined using X-ray
27 years after the 1953 theoretically determined structural model of
Watson & Crick)
• 2hhd - human hemoglobin, (deoxy form)
• 9ins - insulin
The .pdb file format
HEADER EXTRACELLULAR MATRIX 22-JAN-98 1A3I
TITLE X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE
TITLE 2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)
...
EXPDTA X-RAY DIFFRACTION
AUTHOR R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,
AUTHOR 2 B.BRODSKY,A.ZAGARI,H.M.BERMAN
...
REMARK 350 BIOMOLECULE: 1
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000
REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000
...
SEQRES 1 A 9 PRO PRO GLY PRO PRO GLY PRO PRO GLY
SEQRES 1 B 6 PRO PRO GLY PRO PRO GLY
SEQRES 1 C 6 PRO PRO GLY PRO PRO GLY
...
ATOM 1 N PRO A 1 8.316 21.206 21.530 1.00 17.44 N
ATOM 2 CA PRO A 1 7.608 20.729 20.336 1.00 17.44 C
ATOM 3 C PRO A 1 8.487 20.707 19.092 1.00 17.44 C
ATOM 4 O PRO A 1 9.466 21.457 19.005 1.00 17.44 O
ATOM 5 CB PRO A 1 6.460 21.723 20.211 1.00 22.26 C
...
HETATM 130 C ACY 401 3.682 22.541 11.236 1.00 21.19 C
HETATM 131 O ACY 401 2.807 23.097 10.553 1.00 21.19 O
HETATM 132 OXT ACY 401 4.306 23.101 12.291 1.00 21.19 O
...
The .pdb file format

atom 3D coordinates
number and name atom type
occupancy B-factor
residue type, chain ID, number
Model
All protein structures are models! Structures are not directly measured, but are
generated as models that best fit the collected experimental data.
How good is a structure?
How much is the resolution of the collected data? (In case of
X-ray)

R-factor/FreeR-factor (X-ray)
How well does the data fit to the experimental mesurements?

There is no standard for NMR structures

Ramachandran Plot
Geometry and stereo-chemistry

How similar are these to good known structures


Resolution (X-ray)

• Describes the reliability of


determined atomic coordinates

Very low:>4Å
Individual coordinates cannot be interpreted
Low: 3.0-4.0Å
The fold is recognizable
Average: 1.8-3.0Å
The majority of the structure is correct, with
incorrect rotamers and unreliable surface loop
conformations
Good: 1.0 – 1.8Å
Atomic level: <1.0A

Resolution can change for each


position!
Resolution
Cross-references with other
databases
Visualizing protein structures
1. PDB coordinate file

4. Molecule image
3. Computer

2. Visualization program

Eg.: Rasmol, Pymol, Chimera,


VMD, Jmol, Swiss PDB viewer
The inside of the protein is tightly
packed
Hydrophobic core
Hydrophobic side chains go into the core of the
molecule – but the main chain is highly polar.
The polar groups (C=O and NH) are neutralized
through formation of H-bonds.

Myoglobin

surface buried
Secondary structures are stabilized
by H-bonds
Secondary structure determination
Can be based on:
H-bond patterns
Dihedral angles

Automatic determination using algorithms


DSSP
STRIDE

3 (alpha, beta, coil)


or more categories (e.g. turn, other helix types)

Do not agree 100%


Globin evolution
Similarity between two
structures

Superposition: minimizing distances between positions


RMSD

The most commonly used function


for measuring structural similarity

RMSD is the average distance


between equivalent atoms of
superimposed structures
Structures of evolutionarily related
proteins are usually similar
1ebhA: enolase
1mns : mandelate racemase

Sequence identity: 25%


Active center is very similar
Simlar chemical reactions
Different substrate

The structure is usually more conserved than the sequence


Structural classification
We can group similar and evolutionarily related
protein structures using classification
Example
CATH
http://www.cathdb.info/
SCOP
http://scop2.mrc-lmb.cam.ac.uk/
Structural classification

CATH
Fold - topology

Proteins belonging to the same


fold contain roughly the same
secondary structure elements
in the same order and similar
spatial configuration.
Homolous and analogous structures
Homolous proteins evolved from a common
ancestor via divergence, and share the same
fold
Analogous proteins share the same fold but do
not have and evolutionary relationship (or it is
undetectable)
Some folds are more common (due to physical
effects)
The number of folds is limited
Membrane proteins

Important for:
Energy production
Transport
Cell-cell junction
Signaling

Drug targets
Hydrophobicity of membrane
proteins

Cross-section

Aquaporin Aquaporin
Structure determination of
transmembrane proteins

Approx. 2% of
PDB structures
Conformational changes
Proteins are dynamic
Proteins are dynamic molecules

X-ray NMR
B-factor Structural variability
Missing structure parts

tegniddsliggnasaegpegegtestv

Missing regions in the protein NMR structures with high structural


structure variability
P53 protein

From these parts we only have structures that are in complexes. Why?
Intrinsically disordered protein
●Proteins that do not form a well defined three
dimensional structure
●More than 50% of human proteins are either
fully of partially disordered
●Typical function: Regulations, apoptosis, signal
transduction, stress response
●Malfunctions can lead to diseases
− p53 → Cancer protein
− τ protein → Alzheimer
− Synuclein → Parkinson
Ensemble characterization for IDPs
Experimental methods cannot detect a single conformation, only
time or ensemble averages

Combination of methods are needed (NMR, SAXS)


+ molecular dynamics/modelling

Methods are used to characterize


Radius of gyration
Transient secondary structure elements
Transient long range contacts
Extended structure-function paradigm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy