0% found this document useful (0 votes)
7 views49 pages

17 Compgenomics

Uploaded by

Phlip Ong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views49 pages

17 Compgenomics

Uploaded by

Phlip Ong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
You are on page 1/ 49

What is comparative

genomics?
• Analyzing & comparing genetic material
from different species to study
evolution, gene function, and inherited
disease
• Understand the uniqueness between
different species
Comparative Genomics
• Large scale comparison of genomes to
– understand the biology of individual genomes
– extract general principles applying to groups of
genomes.
• Assumption:
– many biological sequences, structures, and
functions are shared across organisms,
– the signal from these organisms can be increased
by combining them in analyses.
What is compared?
• Gene location
• Gene structure
– Exon number
– Exon lengths
– Intron lengths
– Sequence similarity
• Gene characteristics
– Splice sites
– Codon usage
– Conserved synteny
Large Scale Sequencing and
the advent of comparative
genomics
Hedges SB. Nat Rev Genet 3: 11 2002
Genome Browsers -
EnsEMBL
Genome Browsers - UCSC
Comparative genomic questions
• Which genes evolve fast, and which slowly?
• What is the predominant evolutionary mode of
innovation?
– Gene birth ?
– Pseudogene creation (gene death) ?
– Remodelling ?
• Is this mode the same for different lineages?
• Can we link molecular evolution with animal
evolution?
• Can we identify regulatory regions in genomic
DNA?
Tempo and Mode of Gene
Evolution
• De novo creation

• Gene fusion / fission

• Gene duplication

• Rapid sequence change

• Pseudogenisation

Comparisons of Man and Mouse
follow genome sequencing
Most mouse genes have a
human counterpart
1:1 Other Non-
Orthologues Homologues homologues

~80%

~20%
<1%

Michele Clamp & Ewan Birney


Cluster analysis.
• BLAST-2-Sequence.
• 147 clusters with ≥ 4 members. Hs A1
• 47 olfactory receptor gene clusters.
Mm A1
• 100 non OR clusters. Mm A2
• Mm A3
25 clusters had no detectable counterpart in human genome.
Mm A4
24471686 26638342 27613369 28954952

Mm Chr 13
RU2 Hdgfrp1 Sox4 FLJ20342
1 22

Hs Chr 6
RU2 Hdgfrp1 Prl Sox4 FLJ20342

24415818 22345127 21290171

Emes, R.D., Goodstadt, L., Winter, E.E., Ponting, C.P. Hum Mol Genet. 2003 Apr 1;12(7):701-9
Odorant binding proteins / aphrodisin 8 Aphrodisiac hormone
Hydroxysteroid dehydrogenase 7 Biosynthesis of hormonal steroids.
Class CYP4A Cytochromes P450 7 Oxidation of compounds.
Seminal vesicle-antigen (SVA) 4 Suppression of spermatozoa motility.
Submandibular gland secretory proteins 9 Expression is androgen-dependent.
Obox, homeobox proteins 6 Homeobox proteins.
Androgen-binding protein-α 9 Mate selection.
Prolactin related proteins 22 Placentation.
Cathepsin J-like enzymes 6 Placentation.
Cystatins / Stefins 7 Placentation
HOX cluster 8 Placentation.
Class CYP2D Cytochromes P450 5 Regulated by androgens.
MHC class I 8 Immunity / Mate selection ?

Elafin, eppin, and antileukoproteinase 1 7 Anti-microbial.


Beta-defensin proteins. X 2 clusters 5/5 Anti-microbial.
Eosinophil-associated ribonuclease. 11 Pathogen response.

Claudins 6 Unknown function.


Class mu Glutathione S-transferases 7 Conjugation of glutathione to metabolites?
Butyrophilin homologues 5 Unknown functions.
Proline-rich proteins 4 Unknown function.
Carboxylesterase. 6 Detoxification of xenobiotics
Glioma pathogenesis-related protein 5 Unknown
Proteins of unknown function 6 Unknown function.
Comparative analysis allows assessment of rate of
evolution.
Ka/Ks is the ratio of the number of nucleotide substitutions (Ka) between two
sequences that give rise to changes in the encoded amino acid (asynonymous
substitutions), to the number of substitutions (Ks) that give rise to a codon that
encodes the same amino acid (synonymous substitutions).
The Ka/Ks ratio provides a measure of the degree of amino acid mutation between
two sequences, approximately corrected for the amount of time that has elapsed since
their divergence, and by the local nucleotide substitution rate.
An elevated Ka/Ks value is a marker of rapid molecular evolution.
The mouse, rat and human PLUNC gene loci are highly
conserved.
Similarity of protein sequence appears to be related to
position in the loci
SPLUNCs LPLUNC1/5

LPLUNCs

This may be related to their


molecular evolution.
The existence of few (if any)
single BPI domain proteins in
lower organisms suggests that
SPLUNCs have evolved from
BPI/LPLUNC progenitors
Phylogenetic Analysis

• Understand the lineage of different species.


• Have an organizing principle for sorting species into a
taxonomy
• Understand how various functions evolved.
• Understand forces and constraints on evolution.
The PLUNC/BPI protein family contains distinct branches

BPI/LBP SPLUNCs

LPLUNCs LPLUNC1/5
The PLUNC/BPI protein family contains distinct branches

BPI/LBP

Non-vertebrate proteins
always appear in this branch!

Arabidopsis proteins - no second


cysteine!
The high sequence variability between LBP and BPI
has functional consequences worthy of further study
Comparative analysis allows assessment of rate of
evolution.
Ka/Ks is the ratio of the number of nucleotide substitutions (Ka) between two
sequences that give rise to changes in the encoded amino acid (asynonymous
substitutions), to the number of substitutions (Ks) that give rise to a codon that
encodes the same amino acid (synonymous substitutions).
The Ka/Ks ratio provides a measure of the degree of amino acid mutation between
two sequences, approximately corrected for the amount of time that has elapsed since
their divergence, and by the local nucleotide substitution rate.
An elevated Ka/Ks value is a marker of rapid molecular evolution.
PLUNC proteins are rapidly evolving.

Ka Ks Ka/Ks
SPLUNC1 0.18 0.40 0.46
SPLUNC3 0.25 0.50 0.49
LPLUNC1 0.28 0.87 0.32
LPLUNC2 0.21 0.49 0.43
LPLUNC3 0.13 0.51 0.26
LPLUNC4 0.08 0.63 0.13
LPLUNC6 0.16 0.72 0.23

BPI 0.36 0.76 0.47


LBP 0.20 0.76 0.26
PLTP 0.10 0.70 0.15
BPIL2 0.19 0.44 0.44

Median value for domain containing proteins (7,400) is 0.061 (16-83%


range = 0.015-0.178
Molecular evolution has produced human
specific pseudogenes.

This is part of what makes us different


from apes
Molecular evolution has produced human
specific pseudogenes.

This is part of what makes us different


from apes

The SPLUNC gene BASE is such a


gene
Fig 1: Alignment of BASE cDNA sequence with human genome
(From Egland et al. PNAS Feb 2003)
Comparison of exon 6 sequences
Amino acid alignment
Modelled structure of chimp BASE, to show the frame-shifted (cyan)
and deleted (red) regions in human BASE.
Comparative sequence alignment can
also be used to gain information on
regulatory regions
Phylogenetic footprinting

• Assumption:
Mutations within functional regions of genes will
accumulate more slowly than mutations in
regions without sequence-specific function.

The regulation of orthologous (common


ancestor, separated by speciation) genes will be
subject to the same regulatory mechanisms in
different species.
Can use closely or distantly related species

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy