0% found this document useful (0 votes)

115 views53 pages

Into To Bioinfo

Bioinformatics can change one's life by analyzing vast amounts of biological data. It involves merging computer science and molecular biology to solve problems in biology. Key events included the human genome being sequenced in 2000 in a major international effort. Bioinformatics uses powerful computing and data analysis methods to make sense of the rapid growth of biological data from sequencing technologies. It provides essential tools to store, organize, and analyze this deluge of information.

Uploaded by

anshul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views53 pages

Into To Bioinfo

Uploaded by

anshul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

How Bioinformatics can change your life

Basic Concepts of Bioinformatics

Introduction……

2
2000
 A Major event happened that was to
change the course of human history
 It was a joint British and American
effort
 It was a race – who will complete first
 Race Test – not whether they have
taken drugs but whether they can
produce them!
 Human genome was sequenced

3
Bioinformatics is:
driven by the generation of data,
moderated by hardware and
analysis methods

Computing
power

Analysis methods

Data generation
platforms
4
What is
 The merging between computer
science and molecular biology
 The algorithm and techniques of
computer science are being used to
solve the problems faced by molecular
biologists
 ‘Information technology applied to
the management and analysis of
biological data’
 Storage and Analysis are two of the
important functions – bioinformaticians
build tools for each

5
Biology Chemistry

Computer
Science Statistics

Bioinformatics

6
What is..

 This is the age of the Information

Technology
 However storing info is nothing new
 Information to the volume of
Britannica Encyclopedia is stored in
each of our cells
 ‘Bioinformatics tries to determine
what info is biologically important’

7
Basics
of
Molecular Biology….

8
DNA & Genes
 DNA is where the genetic information is
stored
 Blonde hair and blue eyes are inherited by
this
 Gene - The basic unit of heredity
 There are genes for characteristics i.e. a gene
for blond hair etc
 Genes contain the information as a
sequence of nucleotides
 Genes are abstract concepts – like
longitude and latitudes in the sense that
you cannot see them separately
 Genes are made up of nucleotides

9
Nucleotide (nt)
 Each nt I made up of
 Sugar
 Phospate group
 Base
 The base it (nt) contains makes the only
difference between one nt and the other
 There are 4 different bases
 G(uanine),A(denine),T(hymine),C(ytosine)
 The information is in the order of nucleotide
and the order is the info
 Genes can be many thousands of nt long
 The complete set of genetic instructions is
called genomes

10
Proteins
 Proteins are very important
biological feature
 Amino Acids make up the proteins
 20 different amino acids are there
 The function of a protein is
dependant on the order of the amino
acids

11
Proteins…
 The information required to make aa is stored
in DNA
 DNA sequence determines amino acid
sequence
 Amino Acid sequence determines protein
structure
 Protein structure determines protein function
 A Substance called RNA is used to carry the
Info stored in the DNA that in turn is used to
make proteins
 Storage - DNA
 Information Transfer – RNA
 RNA is the message boy!

12
Central dogma

DNA transcription RNA Translation Protein

RNA Polymerase Ribosomes

13
14
Proteins…..
 Since there are 20 amino acids to
translate one nt cannot correspond to
one aa, neither can it correspond as twos
 So in triplet codes – codon – protein
information is carried
 The codons that do not correspond to a
protein are stop codons – UAA, UAG,
UGA (RNA has U instead of T)

 Some codons are used as start codons -

AUG as well as to code methionine

15
Protein Structure
 Shows a wide variety as opposed to the DNA
whose structure is uniform
 X-ray crystallography or Nuclear Magnetic
Resonance (NMR) is used to figure out the
structure
 Structure is related to the function or rather
structure determines the function
 Although proteins are created as a linear structure
of aa chain they fold into 3 d structure.
 If you stretch them and leave them they will go
back to this structure – this is the native structure
of a protein
 Only in the native structure the proteins functions
well
 Even after the translation is over protein goes
through some changes to its structure

16
Bioinformatics
Techniques…..

17
Prediction and Pattern
Recognition
 The two main areas of bioinformatics
are
 Pattern recognition
 ‘A particular sequence or structure has
been seen before’ and that a particular
characteristic can be associated with it
 Prediction
 From a sequence (what we know) we
can predict the structure and function
(what we don’t know)
18
Dot plots….

 Simple way of evaluating

similarity between two
sequences
 In a graph one sequence is on
one side the next on the other
side
 Where there are matches
between the two sequences the
graph is marked
19
20
Alignments
 A match for similarity between the characters of two or
more sequences
 Eg.
 TTACTATA
 TAGATA
 There are so many ways to align the above two
sequences
 1.
 TTACTATA
 TAGATA
 2.
 TTACTATA
 TAGATA
 3.
 TTACTATA
 TAGATA
 So which one do we choose and on what basis?
 Solution is to Provide a match score and mismatch score

21
Dynamic Programming
 As the length of the query sequences
increase and the difference of length
between the two sequence also increases
–more gaps has to be inserted in various
places
 We cannot perform an exhaustive search
 Combinatorial explosion occurs – too much
combinations to search for
 Dynamic programming is a way of using
heuristics to search in the most promising
path
22
Databases
 Sequence info is stored in databases
 So that they can be manipulated
easily
 The db (next slide) are located at diff
places
 They exchange info on a daily basis
so that they are up-to-date and are in
sync
 Primary db – sequence data

23
Nucleic acid (DNA/RNA)
sequence databases
 One main database arising from a partnership between
GenBANK at the NCBI (National Center for
Biotechnology Information – USA), the EMBL data
library at the EBI (European Bioinformatics Institute –
UK) and the DNA Data Bank at the NIG (National
Institute of Genetics – Japan).
 Daily exchanges between the 3 partners to keep the
databases synchronised.
 DNA and RNA sequences: curated, archived,
distributed.
 Sequences from genome projects, scientific articles,
patent applications. Most scientific journals require DNA
and RNA sequences related to each publication to be
publicly available.
 Sequences deposited early and going through a review
cycle; unannotated.. preliminary.. unreviewed..
standard.
 Format: human and computer readable.
24
25
Major Primary DB
Nucleic Acid Protein
EMBL (Europe) PIR -
Protein Information
Resource
GenBank (USA) MIPS,NCBI
DDBJ (Japan) SWISS-PROT
University of Geneva,
now with EBI
NCBI TrEMBL
A supplement to SWISS-
PROT
NRL-3D
Composite DB

 As there are many db which one to

search? Some are good in some
aspects and weak in others?
 Composite db is the answer – which
has several db for its base data
 Search on these db is indexed and
streamlined so that the same stored
sequence is not searched twice in
different db

27
Composite DB

 OWL has these as their primary

db
 SWISS PROT (top priority)
 PIR

 GenBank

 NRL-3D

28
Secondary db

 Store secondary structure info

or results of searches of the
primary db

Compo Primary
DB Source
PROSITE SWISS-PROT

PRINTS OWL

29
Structural databases

 The main database of protein structures is the PDB

(Protein Data Bank).

 The PDB started in 1971 at Brookhaven National

Labs (NY, USA) and is now a distributed
organisation (Research Collaboratory for Structural
Bioinformatics, www.rcsb.org) of US partners
(Rutgers, NJ; San Diego Supercomputer Centre,
Ca; NIST, Md).

 The PDB includes protein structures (and a few

DNA and other structures) determined by X-ray
crystallography and Nuclear Magnetic Resonance.

30
Database Searches
 We have sequenced and identified
genes. So we know what they do
 The sequences are stored in databases
 So if we find a new gene in the human
genome we compare it with the already
found genes which are stored in the
databases.
 Since there are large number of
databases we cannot do sequence
alignment for each and every sequence
 So heuristics must be used again.

31
Areas in
Bioinformatics…

32
Genomics
 Because of the multicellular structure, each
cell type does gene expression in a
different way –although each cell has the
same content as far as the genetic
 i.e. All the information for a liver cell to be a
liver cell is also present on nose cell, so
gene expression is the only thing that
differentiates

33
Genomics - Finding Genes
 Gene in sequence data – needle in a
haystack
 However as the needle is different
from the haystack genes are not diff
from the rest of the sequence data
 Is whole array of nt we try to find and
border mark a set of nt as a gene
 This is one of the challenges of
bioinformatics
 Neural networks and dynamic
programming are being employed

34
Organism Genome Gene Web Site
Size Number
(Mb)
bp * 1,000,000

Yeast 13.5 6,241 http://genome-

www.stanford.ed
u/Saccharomyce
s
Fruit Flies 180 13,601 http://flybase.bio.
indiana.edu
Homo 3,000 45,000 http://www.ncbi.n
Sapiens lm.nih.gov/geno
me/guide
Proteomics
 Proteome is the sum total of an
organisms proteins
 More difficult than genomics
 4 20
 Simple chemical makeup complex
 Can duplicate can’t
 We are entering into the ‘post
genome era’
 Meaning much has been done with
the Genes – not that it’s a over
36
Proteomics…..
 The relationship between the RNA and the protein it codes are
usually very different
 After translation proteins do change
 So aa sequence do not tell anything about the post
translation changes
 Proteins are not active until they are combined into a larger
complex or moved to a relevant location inside or outside the cell
 So aa only hint in these things
 Also proteins must be handled more carefully in labs as they tend
to change when in touch with an inappropriate material

37
Protein Structure Prediction

 Is one of the biggest challenges

of bioinformatics and esp.
biochemistry
 No algorithm is there now to
consistently predict the structure
of proteins

38
Structure Prediction methods

 Comparative Modeling
 Target proteins structure is
compared with related proteins
 Proteins with similar sequences
are searched for structures

39
Phylogenetics
 The taxonomical system reflects
evolutionary relationships
 Phylogenetics trees are things which reflect
the evolutionary relationship thru a
picture/graph
 Rooted trees where there is only one
ancestor
 Un rooted trees just showing the
relationship
 Phylogenetic tree reconstruction algorithms
are also an area of research

40
Applications….

41
Medical Implications
 Pharmacogenomics
 Not all drugs work on all patients, some good
drugs cause death in some patients
 So by doing a gene analysis before the
treatment the offensive drugs can be avoided
 Also drugs which cause death to most can be
used on a minority to whose genes that drug is
well suited – volunteers wanted!
 Customized treatment
 Gene Therapy
 Replace or supply the defective or missing gene
 E.g: Insulin and Factor VIII or Haemophilia

 BioWeapons (??)

42
Diagnosis of Disease
 Diagnosis of disease
 Identification of genes which cause the
disease will help detect disease at early
stage e.g. Huntington disease -
 Symptoms – uncontrollable dance like
movements, mental disturbance,
personality changes and intellectual
impairment
 Death in 10-15 years
 The gene responsible for the disease has
been identified

43
Drug Design
 Can go up to 15yrs and
$700million
 One of the goals of bioinformatics
is to reduce the time and cost
involved with it.
 The process
 Discovery
 Computational methods can improves
this
 Testing
44
Discovery

Target identification
 Identifying the molecule on which the
germs relies for its survival
 Then we develop another molecule
i.e. drug which will bind to the target
 So the germ will not be able to interact
with the target.
 Proteins are the most common targets

45
Discovery…

 For example HIV produces HIV

protease which is a protein and
which in turn eat other proteins
 This HIV protease has an active
site where it binds to other
molecules
 So HIV drug will go and bind
with that active site
46
Discovery…

 Lead compounds are the

molecules that go and bind to
the target protein’s active site
 Traditionally this has been a trial
and error method
 Now this is being moved into the
realm of computers

47
Related Computer
Technology………….

48
PERL
 Perl is commonly used for
bioinformatics calculations as its ability
to manipulate character symbols
 The default CGI language
 It started out as a scripting language
but has become a fully fledged
language
 IT has everything now, even web
service support
 http://bio.perl.org

49
The place of XML & Web
Services
 Various markup languages are being created –
Gene Markup language etc to represent
sequence/gene data
 Web Services – program to program interaction,
making the web application centric as opposed to
human centric
 So this has to platform language independent
 Protocols like SOAP help in this regard
 In bioinformatics various databases are being used,
different platforms, languages etc
 So web services helps achieve platform
independence and program interaction
 Since sequence data bases are in various formats,
platforms SOAP also helps in this regards

50
Data bases and Mining

 Lot of the sequence databases are

available publicly
 As there is a DB involved various
data mining techniques are used to
pull the data out
 As there is a lot of literature – articles
etc – on this area a data mining on
the literature.

51
European Molecular Biology
Network (EMBnet)
 A central system for sharing, training
and centralizing up to date bio info
 Some of the EMBnet sites are:
 SQENET
 http://www.seqnet.dl.ac.uk
 UCL
 http://www.biochem.ucl.ac.uk/bsm/dbbro
wser/embnet/
 EBI – European Bioinformatics
Institute
 www.ebi.ac.uk
52
References
 Dan E. Krane and Michael L. Raymer
 Basic Concepts of Bioinformatics

 Arthur M Lesk
 Intro to Bioinformatics

 T.K. Attwood & D. J. Parry-Smith

 Intro to Bioinformatics

 The genetic Revolution

 Dr Patrick Dixon

 Prof David Gilbert’s Site

 http://www.brc.dcs.gla.ac.uk/~drg/

Repair+manuals Chilton Manuales
39% (95)
Repair+manuals Chilton Manuales
26 pages
STR'L Calc. For Cuplock Scaffolding System
No ratings yet
STR'L Calc. For Cuplock Scaffolding System
61 pages
Introduction To Bioinformatics
100% (1)
Introduction To Bioinformatics
52 pages
Test For Upload
No ratings yet
Test For Upload
25 pages
Bioinf Lecture1-2
No ratings yet
Bioinf Lecture1-2
44 pages
Introduction To Bioinformatics 1
No ratings yet
Introduction To Bioinformatics 1
109 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
57 pages
Basic Concepts of Bioinformatics: How Bioinformatics Can Change Your Life
100% (1)
Basic Concepts of Bioinformatics: How Bioinformatics Can Change Your Life
59 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Lecture1-Bioinformatics Technologies
No ratings yet
Lecture1-Bioinformatics Technologies
69 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
Bio 1 25
No ratings yet
Bio 1 25
50 pages
Lecture Bioinfo Databases
No ratings yet
Lecture Bioinfo Databases
27 pages
BIF401 Midterm Short Notes
No ratings yet
BIF401 Midterm Short Notes
45 pages
CE6068 Lecture 2
No ratings yet
CE6068 Lecture 2
95 pages
Module I
No ratings yet
Module I
65 pages
Bioin
No ratings yet
Bioin
34 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
42 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Lecture 1: INTRODUCTION: A/Prof. Ly Le School of Biotechnology Email: Office: RM 705
100% (1)
Lecture 1: INTRODUCTION: A/Prof. Ly Le School of Biotechnology Email: Office: RM 705
43 pages
Sayan Sir Bio Informatics
No ratings yet
Sayan Sir Bio Informatics
14 pages
Lec 01
No ratings yet
Lec 01
93 pages
Bioinfo Notes
No ratings yet
Bioinfo Notes
5 pages
L1 MolecularBiology
No ratings yet
L1 MolecularBiology
31 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
L2 Proteomics, Genomics and Bioinformatics
No ratings yet
L2 Proteomics, Genomics and Bioinformatics
30 pages
Bioinfromatics Part - 2
No ratings yet
Bioinfromatics Part - 2
77 pages
Computational Biology: by Safynaz Abdel-Fattah Sayed Computer Science Department
No ratings yet
Computational Biology: by Safynaz Abdel-Fattah Sayed Computer Science Department
36 pages
Rana
No ratings yet
Rana
53 pages
01introduction PDF
No ratings yet
01introduction PDF
22 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
100% (2)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
Introduction To Bioinformatics Unit-II Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics Unit-II Introduction To Bioinformatics
78 pages
PM703 Practical Biotechnology (2019) PM703 Practical Biotechnology (2019)
No ratings yet
PM703 Practical Biotechnology (2019) PM703 Practical Biotechnology (2019)
20 pages
CE6068 Lecture 1
No ratings yet
CE6068 Lecture 1
89 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Bioinformatics Lecture 1-Fall 2024
No ratings yet
Bioinformatics Lecture 1-Fall 2024
39 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
No ratings yet
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
24 pages
Bioinformatics 1.1
No ratings yet
Bioinformatics 1.1
52 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
76 pages
Sequence Alignment
No ratings yet
Sequence Alignment
8 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Introduction To Bioinformatics: High-Throughput Biological Data and Evolution
No ratings yet
Introduction To Bioinformatics: High-Throughput Biological Data and Evolution
39 pages
Lecture1 20060306 Kang
No ratings yet
Lecture1 20060306 Kang
46 pages
First Lecture
No ratings yet
First Lecture
89 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
34 pages
Bioinformatics 2015
No ratings yet
Bioinformatics 2015
269 pages
Bioinformatics Manual
No ratings yet
Bioinformatics Manual
117 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Molecular Parte3
No ratings yet
Molecular Parte3
3 pages
Bio Info Merged
No ratings yet
Bio Info Merged
154 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Introduction To Bioinformatics - Notes
No ratings yet
Introduction To Bioinformatics - Notes
18 pages
CSC 121 Computers and Scientific Thinking Fall 2005
No ratings yet
CSC 121 Computers and Scientific Thinking Fall 2005
15 pages
Bioinformatics Made Easy
No ratings yet
Bioinformatics Made Easy
232 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
Applied Mechanics I - Fall 2013 PDF
100% (1)
Applied Mechanics I - Fall 2013 PDF
4 pages
Across Arabian Seas - PT 4
No ratings yet
Across Arabian Seas - PT 4
3 pages
Brewers Fayre Main Menu Band4
No ratings yet
Brewers Fayre Main Menu Band4
8 pages
2024 Biology Booklet Questions
No ratings yet
2024 Biology Booklet Questions
595 pages
Recent Advances in The Hydrosilylation of Alkynes
No ratings yet
Recent Advances in The Hydrosilylation of Alkynes
4 pages
Rod Seals: Technical Details
No ratings yet
Rod Seals: Technical Details
2 pages
John Nash PHD Thesis Length
100% (3)
John Nash PHD Thesis Length
9 pages
PMC-1304-3 Catalogue (20161122S)
No ratings yet
PMC-1304-3 Catalogue (20161122S)
3 pages
JBL 8340a.v6
No ratings yet
JBL 8340a.v6
2 pages
AC 21-99 Sect 2 Chap 4
No ratings yet
AC 21-99 Sect 2 Chap 4
38 pages
Ductile Iron Pipe Piles (DSI-Case Atlantic)
No ratings yet
Ductile Iron Pipe Piles (DSI-Case Atlantic)
16 pages
History of Irrigation
No ratings yet
History of Irrigation
24 pages
Microteaching Chemistry
No ratings yet
Microteaching Chemistry
3 pages
In Bluebeard's Castle
No ratings yet
In Bluebeard's Castle
65 pages
Clean and Green
100% (1)
Clean and Green
9 pages
Promethean Hardware Cheat Sheet
No ratings yet
Promethean Hardware Cheat Sheet
4 pages
3-Low Voltage Aerial Bundle Cables (6001000V)
No ratings yet
3-Low Voltage Aerial Bundle Cables (6001000V)
11 pages
Unit 4 Water
100% (1)
Unit 4 Water
31 pages
Protectii Diferentiale
No ratings yet
Protectii Diferentiale
8 pages
Tuv India Private Limited: Inspection Visit Report
No ratings yet
Tuv India Private Limited: Inspection Visit Report
5 pages
The 7 Philosophies of Balinese Architecture: Materials
No ratings yet
The 7 Philosophies of Balinese Architecture: Materials
13 pages
Orth Update 2023 16 85-90
No ratings yet
Orth Update 2023 16 85-90
6 pages
Formula Sheet Physics 12
100% (1)
Formula Sheet Physics 12
2 pages
Lyrics
No ratings yet
Lyrics
56 pages
Canablast EDP 10 Pump - en PDF
No ratings yet
Canablast EDP 10 Pump - en PDF
4 pages
Guión Sofía y Francisco Ampliación de Inglés 3ºB
No ratings yet
Guión Sofía y Francisco Ampliación de Inglés 3ºB
2 pages
First Aid For Burns
No ratings yet
First Aid For Burns
10 pages
Company Experience - NDT Services
No ratings yet
Company Experience - NDT Services
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Into To Bioinfo

Uploaded by

Into To Bioinfo

Uploaded by

How Bioinformatics can change your life

Basic Concepts of Bioinformatics

 This is the age of the Information

DNA transcription RNA Translation Protein

 Some codons are used as start codons -

 Simple way of evaluating

 As there are many db which one to

 OWL has these as their primary

 Store secondary structure info

 The main database of protein structures is the PDB

 The PDB started in 1971 at Brookhaven National

 The PDB includes protein structures (and a few

Yeast 13.5 6,241 http://genome-

 Is one of the biggest challenges

 For example HIV produces HIV

 Lead compounds are the

 Lot of the sequence databases are

 T.K. Attwood & D. J. Parry-Smith

 The genetic Revolution

 Prof David Gilbert’s Site

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.