Introduction To Bioinformatics
Introduction To Bioinformatics
Background
Definition
Bioinformatics is the computational analysis and storage of biological Bioinformatics is the computational analysis and storage of biological data data
Derivation
bio biology bio biology informatique French for data processing informatique French for data processing
Goal
To discover new biological insights using computers and biology To discover new biological insights using computers and biology
Chemoinformatics
study and analysis of chemical information study and analysis of chemical information
Medical Informatics
study, invention and implementation of structures and algorithms to study, invention and implementation of structures and algorithms to improve communication, understanding and management of medical improve communication, understanding and management of medical information information
Mathematical Biology
more theoretical. Things which are not necessarily algorithmic, not more theoretical. Things which are not necessarily algorithmic, not necessarily molecular in nature, and are not necessarily useful in necessarily molecular in nature, and are not necessarily useful in analyzing collected data! analyzing collected data!
What is bioinformatics?
Experiment Data Analysis Sequence Structure Function Evolution Pathway Interaction Mutation Expression Result Hypothesis
However!
All results of computer analysis should to be verified by biologists All results of computer analysis should to be verified by biologists
Bioinformatics databases
Public databases are the most important entity in bioinformatics Store knowledge about
Sequence e.g. EMBL Sequence e.g. EMBL Structure e.g. PDB Structure e.g. PDB Pathways e.g. KEGG Pathways e.g. KEGG Interactions e.g. DIP Interactions e.g. DIP Diseases e.g. OMIM Diseases e.g. OMIM And many others And many others
Bioinformatics Tools
Hundreds of computer programs Many freely available Generally available on UNIX or LINUX Often interact with bioinformatics databases Many accessible via the WWW Some require very powerful computers to run on Computational Biology Research Group provide a environment to do this
(See http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml)
Computational Biology Research Group
Analyse
Annotate
Display
Assembly
Human genome is theoretically several long strings totalling 3 billion base pairs
Assembled via hundreds of thousands of overlapping units or contigs to Assembled via hundreds of thousands of overlapping units or contigs to make a single consensus sequence make a single consensus sequence Sequences collated using information stored on ABI sequencer Sequences collated using information stored on ABI sequencer Sequence assembly bioinformatics tools used to Sequence assembly bioinformatics tools used to Automatically assemble fragments Automatically assemble fragments Hand finish using computer tools Hand finish using computer tools Requires constant reassembly and rebuilds as new data comes in Requires constant reassembly and rebuilds as new data comes in E.g. PHRED/PHRAP and Staden E.g. PHRED/PHRAP and Staden
Analyse
Take the assembled string of nucleotides AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCTAGTACG TCACGACGTAGATGCTAGCTGACTCGATGCAGACTGCTA GCTGCCAGCGACTCAGCTACGACTAGCATCGGCGCTAG CATCGGCAGC
Find genes
Train algorithm to look for features e.g. Train algorithm to look for features e.g. Splice sites Splice sites Start // Stop codons Start Stop codons Codon frequency Codon frequency Promoters Promoters
Use existing biological information e.g. ESTs, cDNA Use existing biological information e.g. ESTs, cDNA Build a model of gene structure Build a model of gene structure
Computational Biology Research Group
Find Function
Major challenge in bioinformatics
Search the protein sequence vs database of proteins of known function* Search the protein sequence vs database of proteins of known function* Protein domains are evolutionarily conserved Protein domains are evolutionarily conserved Proteins that are similar in sequence across several species are likely to Proteins that are similar in sequence across several species are likely to have a similar function have a similar function BLAST :: BLAST A query sequence A query sequence Sequence database (protein or nucleotide) Sequence database (protein or nucleotide) Inspection of significant hits Inspection of significant hits There are many other methods used to imply function! There are many other methods used to imply function!
Example
Query Sequence database Human, Unknown
Uniprot
Annotate
Results of raw gene analysis are FEATURES Integration of features, biological rules and knowledge make ANNOTATIONS Write these back to the database Automated what would taken hundreds of scientists to do
Ensembl
Ensembl Genome Browser (www.ensembl.org)
Microarrays
Used in large scale functional Used in large scale functional studies studies Looking for patterns of gene Looking for patterns of gene expression e.g. expression e.g.
Disease vs Normal Disease vs Normal Over time Over time Normalise images Normalise images Measuring and adjust for variability Measuring and adjust for variability Analysis of differentially expressed Analysis of differentially expressed genes genes Storing data Much more Storing data Much more complicated data than sequences complicated data than sequences
Transcriptome Transcriptome
http://www.ebi.ac.uk/microarray/biology_intro.html#Microarrays
Proteomics
Sample
Fractionation Protein Annotation / Bioinformatics
Proteome
Mass spectrometry
Computational Biology Research Group
Systems Biology
How everything fits together by taking a holistic view of a biological system
DNA DNA RNA RNA Proteins Proteins Protein Interactions Protein Interactions Networks Networks Cells Cells Organs Organs
Denis Noble (Oxford) - Science, Vol 295, Issue 5560, 1678-1682 Denis Noble (Oxford) - Science, Vol 295, Issue 5560, 1678-1682
http://www.physiome.org/ http://www.physiome.org/
Computational Biology Research Group
EBI - http://www.ebi.ac.uk/
NCBI - http://www.ncbi.nlm.nih.gov/
BioMart - http://www.biomart.org/
TIGR - http://www.tigr.org/
ExPASy - http://www.expasy.org/
PDB - http://www.rcsb.org/pdb/
Bioperl - http://bioperl.org/
CBRG - http://www.cbrg.ox.ac.uk
genmail@molbiol.ox.ac.uk
Computational Biology Research Group