0% found this document useful (0 votes)
95 views33 pages

Introduction To Bioinformatics

This document provides an introduction to bioinformatics, including: - Defining bioinformatics as the computational analysis and storage of biological data to discover new biological insights. - Explaining how bioinformatics was used extensively in the Human Genome Project to assemble, analyze, annotate, and display vast amounts of genomic data. - Describing some of the common tools and databases used in bioinformatics, such as sequence assembly tools, BLAST for function prediction, and databases like UniProt, Ensembl, and UCSC Genome Browser.

Uploaded by

SunnyD
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views33 pages

Introduction To Bioinformatics

This document provides an introduction to bioinformatics, including: - Defining bioinformatics as the computational analysis and storage of biological data to discover new biological insights. - Explaining how bioinformatics was used extensively in the Human Genome Project to assemble, analyze, annotate, and display vast amounts of genomic data. - Describing some of the common tools and databases used in bioinformatics, such as sequence assembly tools, BLAST for function prediction, and databases like UniProt, Ensembl, and UCSC Genome Browser.

Uploaded by

SunnyD
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Bioinformatics

Stephen Taylor Stephen Taylor

Computational Biology Research Group

Background
Definition
Bioinformatics is the computational analysis and storage of biological Bioinformatics is the computational analysis and storage of biological data data

Derivation
bio biology bio biology informatique French for data processing informatique French for data processing

Goal
To discover new biological insights using computers and biology To discover new biological insights using computers and biology

Computational Biology Research Group

Other related disciplines


Computational Biology
broader term, more an approach as generic as biology itself broader term, more an approach as generic as biology itself

Chemoinformatics
study and analysis of chemical information study and analysis of chemical information

Medical Informatics
study, invention and implementation of structures and algorithms to study, invention and implementation of structures and algorithms to improve communication, understanding and management of medical improve communication, understanding and management of medical information information

Mathematical Biology
more theoretical. Things which are not necessarily algorithmic, not more theoretical. Things which are not necessarily algorithmic, not necessarily molecular in nature, and are not necessarily useful in necessarily molecular in nature, and are not necessarily useful in analyzing collected data! analyzing collected data!

Computational Biology Research Group

What is bioinformatics?
Experiment Data Analysis Sequence Structure Function Evolution Pathway Interaction Mutation Expression Result Hypothesis

Computational Biology Research Group

Why use bioinformatics


Find an answer quickly
Most in silico biology is faster than in vitro Most in silico biology is faster than in vitro

Massive amounts of data to analyse


Need to make use of all information Need to make use of all information Not possible to do analysis by hand Not possible to do analysis by hand Cant organise and store information only using lab note books Cant organise and store information only using lab note books Automation is key Automation is key

However!
All results of computer analysis should to be verified by biologists All results of computer analysis should to be verified by biologists

Computational Biology Research Group

Bioinformatics databases
Public databases are the most important entity in bioinformatics Store knowledge about
Sequence e.g. EMBL Sequence e.g. EMBL Structure e.g. PDB Structure e.g. PDB Pathways e.g. KEGG Pathways e.g. KEGG Interactions e.g. DIP Interactions e.g. DIP Diseases e.g. OMIM Diseases e.g. OMIM And many others And many others

Can be searched in a variety of ways e.g. keyword, pattern, sequence

Computational Biology Research Group

Bioinformatics Tools
Hundreds of computer programs Many freely available Generally available on UNIX or LINUX Often interact with bioinformatics databases Many accessible via the WWW Some require very powerful computers to run on Computational Biology Research Group provide a environment to do this

Computational Biology Research Group

The Human Genome Project


Could not have been achieved without bioinformatics Goals
Determine the complete sequence of the 3 billion DNA subunits Determine the complete sequence of the 3 billion DNA subunits Discover all the human genes and make them accessible for further Discover all the human genes and make them accessible for further biological study biological study

Need to bring together and store vast amounts of information from


Lab equipment and experiments Lab equipment and experiments Computer Analysis Computer Analysis Human Analysis Human Analysis Make visible to the worlds scientists Make visible to the worlds scientists
Computational Biology Research Group

Central Dogma of Molecular Biology

(See http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml)
Computational Biology Research Group

Overview HGP bioinformatics


Assemble

Analyse

Annotate

Display

Computational Biology Research Group

Assembly
Human genome is theoretically several long strings totalling 3 billion base pairs
Assembled via hundreds of thousands of overlapping units or contigs to Assembled via hundreds of thousands of overlapping units or contigs to make a single consensus sequence make a single consensus sequence Sequences collated using information stored on ABI sequencer Sequences collated using information stored on ABI sequencer Sequence assembly bioinformatics tools used to Sequence assembly bioinformatics tools used to Automatically assemble fragments Automatically assemble fragments Hand finish using computer tools Hand finish using computer tools Requires constant reassembly and rebuilds as new data comes in Requires constant reassembly and rebuilds as new data comes in E.g. PHRED/PHRAP and Staden E.g. PHRED/PHRAP and Staden

Computational Biology Research Group

Analyse
Take the assembled string of nucleotides AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCTAGTACG TCACGACGTAGATGCTAGCTGACTCGATGCAGACTGCTA GCTGCCAGCGACTCAGCTACGACTAGCATCGGCGCTAG CATCGGCAGC

Computational Biology Research Group

Find genes

Train algorithm to look for features e.g. Train algorithm to look for features e.g. Splice sites Splice sites Start // Stop codons Start Stop codons Codon frequency Codon frequency Promoters Promoters

Use existing biological information e.g. ESTs, cDNA Use existing biological information e.g. ESTs, cDNA Build a model of gene structure Build a model of gene structure
Computational Biology Research Group

Find translated protein(s)


Translate DNA to theoretical protein 5 3

>Unknown Sequence VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF DLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLR VDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKY R


Computational Biology Research Group

Find Function
Major challenge in bioinformatics
Search the protein sequence vs database of proteins of known function* Search the protein sequence vs database of proteins of known function* Protein domains are evolutionarily conserved Protein domains are evolutionarily conserved Proteins that are similar in sequence across several species are likely to Proteins that are similar in sequence across several species are likely to have a similar function have a similar function BLAST :: BLAST A query sequence A query sequence Sequence database (protein or nucleotide) Sequence database (protein or nucleotide) Inspection of significant hits Inspection of significant hits There are many other methods used to imply function! There are many other methods used to imply function!

* Many of the databases contain errors


Computational Biology Research Group

Example
Query Sequence database Human, Unknown

Uniprot

Chimp, Myoglobin Search Results Pig, Myoglobin Mouse, Myoglobin

Putative function = Human, Myoglobin


Computational Biology Research Group

Annotate
Results of raw gene analysis are FEATURES Integration of features, biological rules and knowledge make ANNOTATIONS Write these back to the database Automated what would taken hundreds of scientists to do

Computational Biology Research Group

Ensembl
Ensembl Genome Browser (www.ensembl.org)

Computational Biology Research Group

UCSC Genome Browser http://genome.ucsc.edu/

Computational Biology Research Group

And this is just the start


Bioinformatics is now in the post - genomics era
Transcriptomics Transcriptomics Epigenomics Epigenomics Proteomics Proteomics Comparative genomics Comparative genomics Metabolic pathways Metabolic pathways Regulatory networks Regulatory networks Virtual Cells Virtual Cells Modelling Systems and Organs Modelling Systems and Organs

Computational Biology Research Group

Microarrays
Used in large scale functional Used in large scale functional studies studies Looking for patterns of gene Looking for patterns of gene expression e.g. expression e.g.
Disease vs Normal Disease vs Normal Over time Over time Normalise images Normalise images Measuring and adjust for variability Measuring and adjust for variability Analysis of differentially expressed Analysis of differentially expressed genes genes Storing data Much more Storing data Much more complicated data than sequences complicated data than sequences

Bioinformatics used to Bioinformatics used to

Transcriptome Transcriptome

http://www.ebi.ac.uk/microarray/biology_intro.html#Microarrays

Computational Biology Research Group

Proteomics
Sample
Fractionation Protein Annotation / Bioinformatics

Proteome

Mass spectrometry
Computational Biology Research Group

Systems Biology
How everything fits together by taking a holistic view of a biological system
DNA DNA RNA RNA Proteins Proteins Protein Interactions Protein Interactions Networks Networks Cells Cells Organs Organs

Will require a huge amount of data and corresponding computational Infrastructure


Computational Biology Research Group

Systems biology examples


Dennis Bray (Cambridge) Dennis Bray (Cambridge) http://www.pdn.cam.ac.uk/groups/comp-cell/ http://www.pdn.cam.ac.uk/groups/comp-cell/
Modelling chemotaxis in bacteria Modelling chemotaxis in bacteria Modelling the Heart--from Genes to Cells to the Whole Organ Modelling the Heart--from Genes to Cells to the Whole Organ

Denis Noble (Oxford) - Science, Vol 295, Issue 5560, 1678-1682 Denis Noble (Oxford) - Science, Vol 295, Issue 5560, 1678-1682

http://www.physiome.org/ http://www.physiome.org/
Computational Biology Research Group

List of useful websites


WWW has driven a lot of bioinformatics research Low powered computers access very high powered computers Important to use all resources available to do research Hundreds of sites available

Computational Biology Research Group

EBI - http://www.ebi.ac.uk/

Computational Biology Research Group

NCBI - http://www.ncbi.nlm.nih.gov/

Computational Biology Research Group

BioMart - http://www.biomart.org/

Computational Biology Research Group

TIGR - http://www.tigr.org/

Computational Biology Research Group

ExPASy - http://www.expasy.org/

Computational Biology Research Group

PDB - http://www.rcsb.org/pdb/

Computational Biology Research Group

Bioperl - http://bioperl.org/

Computational Biology Research Group

CBRG - http://www.cbrg.ox.ac.uk

genmail@molbiol.ox.ac.uk
Computational Biology Research Group

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy