Structural bioinformatics
Structural bioinformatics
dosztanyi@caesar.elte.hu
7 October 2019
Basic features of protein structures
Structure determination methods
PDB database
Visualization structures
Analysis of protein structures
Practical
PDB
Visualization of protein structures
• “It has not escaped our notice that the specific pairing we
have postulated immediately suggests a possible copying
mechanism for the genetic material.”
Wide range of functions of proteins
Amino acid Peptide bond
Carboxyl group Amino acid 1 Amino acid 2
Amino group
Peptide bond
Amino acids
Polypeptide e
chain
tid
ep
yp
ol
P
Amino acid
20 Amino acids
How many different sequences can a
100 amino acid long protein have?
20100 ~10130
Hemoglobin
Multi subunit protein
complexes
– Homo and hetero
oligomers
Interactions stabilizing proteins
Hydrogen bond
Between side chains and Ionic bond
the main chain
Hydrophobic effect
Van der Waals
interaction
Hydrogen bond
Between side chains
Disulphide
bridge
Van der Waals interaction
VdW radius
Water molecules
Hydrophobic effect
Hydrophilic residues
Hydrogen bonds are formed by a H-atom bound in the structure with a high
electronegativity atom (F, N, O) from a different functional group, i.e. a
hydrogen atom establishes a bond between two other atoms.
H-bonds in biological systems
Electrostatics interactions
- ε a dielectric constant
Torsion angle
The main chain φ and ψ
clockwise (+)
counter-clockwise
torsion angles of a protein
(-) cannot take arbitrary values,
there are preferred
conformations.
Rotation around N-
C bond: phi
X-ray crystallography
NMR
Electron microscopy
Where do these structures come from?
- crystallization artefacts
- non-physiological environment
- no information on hydrogens
NMR
- in solution
- usually yields a structural ensemble that fulfills the distance constraints
- only small proteins
- less precise model
- usable for flexible proteins as well
Cryo-EM (atomic resolution)
Nobel Prize in Chemistry 2017
PDB statistics
PDB ID: unique identifier
Each atomic coordinate file in the Protein Data Bank has a unique identifier
composed of exactly 4 characters. The first one is always a number, the rest
can be either a number or a letter.
There are over 400,000 possible 4-digit PDB IDs (419,904 or 466,560 if "0"
can also be the first character).
Examples:
• 1mbn - 1973, the first protein structure model, myoglobin
• 1tna - 1975, the first RNA structure, yeast phenylalanine transfer RNA
• 1bna - 1980, the first B-DNA double helix structure (determined using X-ray
27 years after the 1953 theoretically determined structural model of
Watson & Crick)
• 2hhd - human hemoglobin, (deoxy form)
• 9ins - insulin
The .pdb file format
HEADER EXTRACELLULAR MATRIX 22-JAN-98 1A3I
TITLE X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE
TITLE 2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)
...
EXPDTA X-RAY DIFFRACTION
AUTHOR R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,
AUTHOR 2 B.BRODSKY,A.ZAGARI,H.M.BERMAN
...
REMARK 350 BIOMOLECULE: 1
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000
REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000
...
SEQRES 1 A 9 PRO PRO GLY PRO PRO GLY PRO PRO GLY
SEQRES 1 B 6 PRO PRO GLY PRO PRO GLY
SEQRES 1 C 6 PRO PRO GLY PRO PRO GLY
...
ATOM 1 N PRO A 1 8.316 21.206 21.530 1.00 17.44 N
ATOM 2 CA PRO A 1 7.608 20.729 20.336 1.00 17.44 C
ATOM 3 C PRO A 1 8.487 20.707 19.092 1.00 17.44 C
ATOM 4 O PRO A 1 9.466 21.457 19.005 1.00 17.44 O
ATOM 5 CB PRO A 1 6.460 21.723 20.211 1.00 22.26 C
...
HETATM 130 C ACY 401 3.682 22.541 11.236 1.00 21.19 C
HETATM 131 O ACY 401 2.807 23.097 10.553 1.00 21.19 O
HETATM 132 OXT ACY 401 4.306 23.101 12.291 1.00 21.19 O
...
The .pdb file format
atom 3D coordinates
number and name atom type
occupancy B-factor
residue type, chain ID, number
Model
All protein structures are models! Structures are not directly measured, but are
generated as models that best fit the collected experimental data.
How good is a structure?
How much is the resolution of the collected data? (In case of
X-ray)
R-factor/FreeR-factor (X-ray)
How well does the data fit to the experimental mesurements?
Ramachandran Plot
Geometry and stereo-chemistry
Very low:>4Å
Individual coordinates cannot be interpreted
Low: 3.0-4.0Å
The fold is recognizable
Average: 1.8-3.0Å
The majority of the structure is correct, with
incorrect rotamers and unreliable surface loop
conformations
Good: 1.0 – 1.8Å
Atomic level: <1.0A
4. Molecule image
3. Computer
2. Visualization program
Myoglobin
surface buried
Secondary structures are stabilized
by H-bonds
Secondary structure determination
Can be based on:
H-bond patterns
Dihedral angles
CATH
Fold - topology
Important for:
Energy production
Transport
Cell-cell junction
Signaling
Drug targets
Hydrophobicity of membrane
proteins
Cross-section
Aquaporin Aquaporin
Structure determination of
transmembrane proteins
Approx. 2% of
PDB structures
Conformational changes
Proteins are dynamic
Proteins are dynamic molecules
X-ray NMR
B-factor Structural variability
Missing structure parts
tegniddsliggnasaegpegegtestv
From these parts we only have structures that are in complexes. Why?
Intrinsically disordered protein
●Proteins that do not form a well defined three
dimensional structure
●More than 50% of human proteins are either
fully of partially disordered
●Typical function: Regulations, apoptosis, signal
transduction, stress response
●Malfunctions can lead to diseases
− p53 → Cancer protein
− τ protein → Alzheimer
− Synuclein → Parkinson
Ensemble characterization for IDPs
Experimental methods cannot detect a single conformation, only
time or ensemble averages