0% found this document useful (0 votes)
115 views53 pages

Into To Bioinfo

Bioinformatics can change one's life by analyzing vast amounts of biological data. It involves merging computer science and molecular biology to solve problems in biology. Key events included the human genome being sequenced in 2000 in a major international effort. Bioinformatics uses powerful computing and data analysis methods to make sense of the rapid growth of biological data from sequencing technologies. It provides essential tools to store, organize, and analyze this deluge of information.

Uploaded by

anshul
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views53 pages

Into To Bioinfo

Bioinformatics can change one's life by analyzing vast amounts of biological data. It involves merging computer science and molecular biology to solve problems in biology. Key events included the human genome being sequenced in 2000 in a major international effort. Bioinformatics uses powerful computing and data analysis methods to make sense of the rapid growth of biological data from sequencing technologies. It provides essential tools to store, organize, and analyze this deluge of information.

Uploaded by

anshul
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

How Bioinformatics can change your life

Basic Concepts of Bioinformatics


Introduction……

2
2000
 A Major event happened that was to
change the course of human history
 It was a joint British and American
effort
 It was a race – who will complete first
 Race Test – not whether they have
taken drugs but whether they can
produce them!
 Human genome was sequenced

3
Bioinformatics is:
driven by the generation of data,
moderated by hardware and
analysis methods

Computing
power

Analysis methods

Data generation
platforms
4
What is
 The merging between computer
science and molecular biology
 The algorithm and techniques of
computer science are being used to
solve the problems faced by molecular
biologists
 ‘Information technology applied to
the management and analysis of
biological data’
 Storage and Analysis are two of the
important functions – bioinformaticians
build tools for each

5
Biology Chemistry

Computer
Science Statistics

Bioinformatics

6
What is..

 This is the age of the Information


Technology
 However storing info is nothing new
 Information to the volume of
Britannica Encyclopedia is stored in
each of our cells
 ‘Bioinformatics tries to determine
what info is biologically important’

7
Basics
of
Molecular Biology….

8
DNA & Genes
 DNA is where the genetic information is
stored
 Blonde hair and blue eyes are inherited by
this
 Gene - The basic unit of heredity
 There are genes for characteristics i.e. a gene
for blond hair etc
 Genes contain the information as a
sequence of nucleotides
 Genes are abstract concepts – like
longitude and latitudes in the sense that
you cannot see them separately
 Genes are made up of nucleotides

9
Nucleotide (nt)
 Each nt I made up of
 Sugar
 Phospate group
 Base
 The base it (nt) contains makes the only
difference between one nt and the other
 There are 4 different bases
 G(uanine),A(denine),T(hymine),C(ytosine)
 The information is in the order of nucleotide
and the order is the info
 Genes can be many thousands of nt long
 The complete set of genetic instructions is
called genomes

10
Proteins
 Proteins are very important
biological feature
 Amino Acids make up the proteins
 20 different amino acids are there
 The function of a protein is
dependant on the order of the amino
acids

11
Proteins…
 The information required to make aa is stored
in DNA
 DNA sequence determines amino acid
sequence
 Amino Acid sequence determines protein
structure
 Protein structure determines protein function
 A Substance called RNA is used to carry the
Info stored in the DNA that in turn is used to
make proteins
 Storage - DNA
 Information Transfer – RNA
 RNA is the message boy!

12
Central dogma

DNA transcription RNA Translation Protein


RNA Polymerase Ribosomes

13
14
Proteins…..
 Since there are 20 amino acids to
translate one nt cannot correspond to
one aa, neither can it correspond as twos
 So in triplet codes – codon – protein
information is carried
 The codons that do not correspond to a
protein are stop codons – UAA, UAG,
UGA (RNA has U instead of T)

 Some codons are used as start codons -


AUG as well as to code methionine

15
Protein Structure
 Shows a wide variety as opposed to the DNA
whose structure is uniform
 X-ray crystallography or Nuclear Magnetic
Resonance (NMR) is used to figure out the
structure
 Structure is related to the function or rather
structure determines the function
 Although proteins are created as a linear structure
of aa chain they fold into 3 d structure.
 If you stretch them and leave them they will go
back to this structure – this is the native structure
of a protein
 Only in the native structure the proteins functions
well
 Even after the translation is over protein goes
through some changes to its structure

16
Bioinformatics
Techniques…..

17
Prediction and Pattern
Recognition
 The two main areas of bioinformatics
are
 Pattern recognition
 ‘A particular sequence or structure has
been seen before’ and that a particular
characteristic can be associated with it
 Prediction
 From a sequence (what we know) we
can predict the structure and function
(what we don’t know)
18
Dot plots….

 Simple way of evaluating


similarity between two
sequences
 In a graph one sequence is on
one side the next on the other
side
 Where there are matches
between the two sequences the
graph is marked
19
20
Alignments
 A match for similarity between the characters of two or
more sequences
 Eg.
 TTACTATA
 TAGATA
 There are so many ways to align the above two
sequences
 1.
 TTACTATA
 TAGATA
 2.
 TTACTATA
 TAGATA
 3.
 TTACTATA
 TAGATA
 So which one do we choose and on what basis?
 Solution is to Provide a match score and mismatch score

21
Dynamic Programming
 As the length of the query sequences
increase and the difference of length
between the two sequence also increases
–more gaps has to be inserted in various
places
 We cannot perform an exhaustive search
 Combinatorial explosion occurs – too much
combinations to search for
 Dynamic programming is a way of using
heuristics to search in the most promising
path
22
Databases
 Sequence info is stored in databases
 So that they can be manipulated
easily
 The db (next slide) are located at diff
places
 They exchange info on a daily basis
so that they are up-to-date and are in
sync
 Primary db – sequence data

23
Nucleic acid (DNA/RNA)
sequence databases
 One main database arising from a partnership between
GenBANK at the NCBI (National Center for
Biotechnology Information – USA), the EMBL data
library at the EBI (European Bioinformatics Institute –
UK) and the DNA Data Bank at the NIG (National
Institute of Genetics – Japan).
 Daily exchanges between the 3 partners to keep the
databases synchronised.
 DNA and RNA sequences: curated, archived,
distributed.
 Sequences from genome projects, scientific articles,
patent applications. Most scientific journals require DNA
and RNA sequences related to each publication to be
publicly available.
 Sequences deposited early and going through a review
cycle; unannotated.. preliminary.. unreviewed..
standard.
 Format: human and computer readable.
24
25
Major Primary DB
Nucleic Acid Protein
EMBL (Europe) PIR -
Protein Information
Resource
GenBank (USA) MIPS,NCBI
DDBJ (Japan) SWISS-PROT
University of Geneva,
now with EBI
NCBI TrEMBL
A supplement to SWISS-
PROT
NRL-3D
Composite DB

 As there are many db which one to


search? Some are good in some
aspects and weak in others?
 Composite db is the answer – which
has several db for its base data
 Search on these db is indexed and
streamlined so that the same stored
sequence is not searched twice in
different db

27
Composite DB

 OWL has these as their primary


db
 SWISS PROT (top priority)
 PIR

 GenBank

 NRL-3D

28
Secondary db

 Store secondary structure info


or results of searches of the
primary db

Compo Primary
DB Source
PROSITE SWISS-PROT

PRINTS OWL

29
Structural databases

 The main database of protein structures is the PDB


(Protein Data Bank).

 The PDB started in 1971 at Brookhaven National


Labs (NY, USA) and is now a distributed
organisation (Research Collaboratory for Structural
Bioinformatics, www.rcsb.org) of US partners
(Rutgers, NJ; San Diego Supercomputer Centre,
Ca; NIST, Md).

 The PDB includes protein structures (and a few


DNA and other structures) determined by X-ray
crystallography and Nuclear Magnetic Resonance.

30
Database Searches
 We have sequenced and identified
genes. So we know what they do
 The sequences are stored in databases
 So if we find a new gene in the human
genome we compare it with the already
found genes which are stored in the
databases.
 Since there are large number of
databases we cannot do sequence
alignment for each and every sequence
 So heuristics must be used again.

31
Areas in
Bioinformatics…

32
Genomics
 Because of the multicellular structure, each
cell type does gene expression in a
different way –although each cell has the
same content as far as the genetic
 i.e. All the information for a liver cell to be a
liver cell is also present on nose cell, so
gene expression is the only thing that
differentiates

33
Genomics - Finding Genes
 Gene in sequence data – needle in a
haystack
 However as the needle is different
from the haystack genes are not diff
from the rest of the sequence data
 Is whole array of nt we try to find and
border mark a set of nt as a gene
 This is one of the challenges of
bioinformatics
 Neural networks and dynamic
programming are being employed

34
Organism Genome Gene Web Site
Size Number
(Mb)
bp * 1,000,000

Yeast 13.5 6,241 http://genome-


www.stanford.ed
u/Saccharomyce
s
Fruit Flies 180 13,601 http://flybase.bio.
indiana.edu
Homo 3,000 45,000 http://www.ncbi.n
Sapiens lm.nih.gov/geno
me/guide
Proteomics
 Proteome is the sum total of an
organisms proteins
 More difficult than genomics
 4 20
 Simple chemical makeup complex
 Can duplicate can’t
 We are entering into the ‘post
genome era’
 Meaning much has been done with
the Genes – not that it’s a over
36
Proteomics…..
 The relationship between the RNA and the protein it codes are
usually very different
 After translation proteins do change
 So aa sequence do not tell anything about the post
translation changes
 Proteins are not active until they are combined into a larger
complex or moved to a relevant location inside or outside the cell
 So aa only hint in these things
 Also proteins must be handled more carefully in labs as they tend
to change when in touch with an inappropriate material

37
Protein Structure Prediction

 Is one of the biggest challenges


of bioinformatics and esp.
biochemistry
 No algorithm is there now to
consistently predict the structure
of proteins

38
Structure Prediction methods

 Comparative Modeling
 Target proteins structure is
compared with related proteins
 Proteins with similar sequences
are searched for structures

39
Phylogenetics
 The taxonomical system reflects
evolutionary relationships
 Phylogenetics trees are things which reflect
the evolutionary relationship thru a
picture/graph
 Rooted trees where there is only one
ancestor
 Un rooted trees just showing the
relationship
 Phylogenetic tree reconstruction algorithms
are also an area of research

40
Applications….

41
Medical Implications
 Pharmacogenomics
 Not all drugs work on all patients, some good
drugs cause death in some patients
 So by doing a gene analysis before the
treatment the offensive drugs can be avoided
 Also drugs which cause death to most can be
used on a minority to whose genes that drug is
well suited – volunteers wanted!
 Customized treatment
 Gene Therapy
 Replace or supply the defective or missing gene
 E.g: Insulin and Factor VIII or Haemophilia

 BioWeapons (??)

42
Diagnosis of Disease
 Diagnosis of disease
 Identification of genes which cause the
disease will help detect disease at early
stage e.g. Huntington disease -
 Symptoms – uncontrollable dance like
movements, mental disturbance,
personality changes and intellectual
impairment
 Death in 10-15 years
 The gene responsible for the disease has
been identified

43
Drug Design
 Can go up to 15yrs and
$700million
 One of the goals of bioinformatics
is to reduce the time and cost
involved with it.
 The process
 Discovery
 Computational methods can improves
this
 Testing
44
Discovery

Target identification
 Identifying the molecule on which the
germs relies for its survival
 Then we develop another molecule
i.e. drug which will bind to the target
 So the germ will not be able to interact
with the target.
 Proteins are the most common targets

45
Discovery…

 For example HIV produces HIV


protease which is a protein and
which in turn eat other proteins
 This HIV protease has an active
site where it binds to other
molecules
 So HIV drug will go and bind
with that active site
46
Discovery…

 Lead compounds are the


molecules that go and bind to
the target protein’s active site
 Traditionally this has been a trial
and error method
 Now this is being moved into the
realm of computers

47
Related Computer
Technology………….

48
PERL
 Perl is commonly used for
bioinformatics calculations as its ability
to manipulate character symbols
 The default CGI language
 It started out as a scripting language
but has become a fully fledged
language
 IT has everything now, even web
service support
 http://bio.perl.org

49
The place of XML & Web
Services
 Various markup languages are being created –
Gene Markup language etc to represent
sequence/gene data
 Web Services – program to program interaction,
making the web application centric as opposed to
human centric
 So this has to platform language independent
 Protocols like SOAP help in this regard
 In bioinformatics various databases are being used,
different platforms, languages etc
 So web services helps achieve platform
independence and program interaction
 Since sequence data bases are in various formats,
platforms SOAP also helps in this regards

50
Data bases and Mining

 Lot of the sequence databases are


available publicly
 As there is a DB involved various
data mining techniques are used to
pull the data out
 As there is a lot of literature – articles
etc – on this area a data mining on
the literature.

51
European Molecular Biology
Network (EMBnet)
 A central system for sharing, training
and centralizing up to date bio info
 Some of the EMBnet sites are:
 SQENET
 http://www.seqnet.dl.ac.uk
 UCL
 http://www.biochem.ucl.ac.uk/bsm/dbbro
wser/embnet/
 EBI – European Bioinformatics
Institute
 www.ebi.ac.uk
52
References
 Dan E. Krane and Michael L. Raymer
 Basic Concepts of Bioinformatics

 Arthur M Lesk
 Intro to Bioinformatics

 T.K. Attwood & D. J. Parry-Smith


 Intro to Bioinformatics

 The genetic Revolution


 Dr Patrick Dixon

 Prof David Gilbert’s Site


 http://www.brc.dcs.gla.ac.uk/~drg/

53

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy