Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
fasta.bioch.virginia.edu/biol4230 1
fasta.bioch.virginia.edu/biol4230 2
1
Phylip 3.69
Advantages Disadvantages
• Free (GNU license) • Much slower than PAUP
• Runs on all major • Search strategy less
platforms comprehensive
• Good documentation • Primitive command-line
• Well known/widely used interface (user hostile)
• Possible to automate • Much file renaming
required
• File formats supported by
other packages • Cannot read NEXUS files
fasta.bioch.virginia.edu/biol4230 3
fasta.bioch.virginia.edu/biol4230 4
2
PHYLIP Tree-building programs
• Maximum Likelihood
– dnaml, dnamlk - DNA maximum likelihood
– proml, promlk - protein maximum likelihood
– *mlk methods assume evolutionary clock (all
branches end at same level (time)
fasta.bioch.virginia.edu/biol4230 5
infile
intree outfile
Phylip outtree
weights
Program plotfile
categories
fontfile
• The phylip programs re-use the same file names: "infile", "outfile", every
time a program is used. In current versions, if the input file is not present, it
is prompted for, and if the output file is present, one is warned before over-
writing it.
• However, it is easy to analyse the wrong data (old "infile") and over write
(or mis-name) the output file.
• Develop a protocol for ensuring that file names make sense. NEVER use
infile and outfile, outree. This can be difficult. Scripts help.
fasta.bioch.virginia.edu/biol4230 6
3
PHYLIP via EMBOSS
fasta.bioch.virginia.edu/biol4230 7
7 112
Bovine CCAAACCTGT CCCCACCATC TAACACCAAC CCACATATAC AAGCTAAACC AAAAATACCA
Mouse CCAAAAAAAC ATCCAAACAC CAACCCCAGC CCTTACGCAA TAGCCATACA AAGAATATTA
Gibbon CTATACCCAC CCAACTCGAC CTACACCAAT CCCCACATAG CACACAGACC AACAACCTCC
Orang CCCCACCCGT CTACACCAGC CAACACCAAC CCCCACCTAC TATACCAACC AATAACCTCT
Gorilla CCCCATTTAT CCATAAAAAC CAACACCAAC CCCCATCTAA CACACAAACT AATGACCCCC
Chimp CCCCATCCAC CCATACAAAC CAACATTACC CTCCATCCAA TATACAAACT AACAACCTCC
Human CCCCACTCAC CCATACAAAC CAACACCACT CTCCACCTAA TATACAAATT AATAACCTCC
4
PHYLIP Tree representation (NEWICK)
Taxa Branch
label Length
(Mouse:0.87231,Bovine:0.49807,(Gibbon:0.25930,(Orang:0.24
166, (Gorilla:0.12322,(Chimp:0.13846,
Human:0.08571):0.06026):0.04405):0.10815):0.39538);
(Mouse:0.87558,Bovine:0.49718,(Gibbon:0.25698,(Orang:0.24
477, ((Gorilla:0.16328,Chimp:0.13802):0.01842,
Human:0.08495):0.06610):0.10637):0.39287);
(Mouse:0.87819,Bovine:0.49461,(Gibbon:0.25837,(Orang:0.24
161, (Chimp:0.13941,(Gorilla:0.16639,
Human:0.09533):0.00616):0.06709):10938):0.39630);
fasta.bioch.virginia.edu/biol4230 9
Gibbon Orang
Gorilla
Chimp
Human
Bovine
Mouse
(Mouse:0.87231,Bovine:0.49807,(Gibbon:0.25930,(Orang:0.24
166, (Gorilla:0.12322,(Chimp:0.13846,
Human:0.08571):0.06026):0.04405):0.10815):0.39538);
fasta.bioch.virginia.edu/biol4230 10
5
Tree-analysis/display
• Tree comparison:
– (f)consense – Calculate consensus tree from
bootstraps
– (f)treedist – compare trees by "partition
distance"
• Manipulation
– retree – flip nodes, re-root, re-arrange – run
interactively
• Display
– (f)drawgram – draw "tree-like" tree
– (f)drawtree – draw unrooted tree
fasta.bioch.virginia.edu/biol4230 11
Running PHYLIP
15 675
GTM1_HUMAN ---------- --ATGCCCAT GATACTGGGG TACTGGGACA TCCGCGGGCT
infile GTM2_HUMAN ---------- --ATGCCCAT GACACTGGGG TACTGGAACA TCCGCGGGCT
GTM3_HUMAN ATGTCGTGCG AGTCGTCTAT GGTTCTCGGG TACTGGGATA TTCGTGGGCT
gstm_n.phy GTM4_HUMAN ---------- --ATGTCCAT GACACTGGGG TACTGGGACA TCCGCGGGCT
GTM5_HUMAN ---------- --ATGCCCAT GACTCTGGGG TACTGGGACA TCCGTGGGCT
GTM1_MOUSE ---------- --ATGCCTAT GATACTGGGA TACTGGAACG TCCGCGGACT
GTM2_MOUSE ---------- --ATGCCTAT GACACTAGGT TACTGGGACA TCCGTGGGCT
GTM3_MOUSE ---------- --ATGCCTAT GACACTGGGC TATTGGAACA CCCGCGGACT
GTM5_MOUSE ATGTCATCCA AGTCT---AT GGTTCTGGGT TACTGGGATA TCCGCGGGCT
GTM1_RAT ---------- --ATGCCTAT GATACTGGGA TACTGGAACG TCCGCGGGCT
GTM2_RAT ---------- --ATGCCTAT GACACTGGGT TACTGGGACA TCCGTGGGCT
GTM3_RAT ---------- --ATGCCCAT GACACTGGGT TACTGGGACA TCCGTGGGCT
GTMU_CRILO ---------- --ATGCCTAT GATACTGGGA TACTGGAATG TCCGCGGTCT
GTMU_MESAU ---------- --ATGCCTGT GACACTGGGT TACTGGGACA TCCGTGGGCT
GTM2_CHICK ---------- --ATGGTGGT CACGTTGGGT TATTGGGACA TCCGCGGGTT
fasta.bioch.virginia.edu/biol4230 12
6
Running PHYLIP - dnaml
$ fdnaml -help
Standard (Mandatory) qualifiers:
[-sequence] seqsetall File containing one or more sequence
alignments
[-intreefile] tree Phylip tree file (optional)
[-outfile] outfile [*.fdnaml] Phylip dnaml program output file
General qualifiers:
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
fasta.bioch.virginia.edu/biol4230 13
7
Running PHYLIP – (f)dnaml
Nucleic acid sequence Maximum Likelihood method, version 3.63
Empirical Base Frequencies:
A 0.25824 Ln Likelihood = -4967.04025
C 0.25662
G 0.25997
T(U) 0.22516 Betwn And Length Approx. Confid. Limits
Transition/transversion ratio = 2.000000 ----- --- ------ ------- ------- ------
fasta.bioch.virginia.edu/biol4230 16
8
Running PHYLIP – (f)dnapars
DNA parsimony algorithm, version 3.63 requires a total of 913.000
fasta.bioch.virginia.edu/biol4230 17
(GTM2_CHICK:0.20337,(GTM5_MOUSE:0.07567,GTM3_HUMAN:0.06117):0.13103,
((GTM3_RAT:0.06735,((GTMU_MESAU:0.06252,(GTM2_RAT:0.03772,GTM2_MOUSE:0.01758):0.02037):0.
03872,
(GTM3_MOUSE:0.06794,(GTMU_CRILO:0.04914,(GTM1_RAT:0.03111,GTM1_MOUSE:0.02815):0.01827):0.
02095):0.03252):0.02700):0.02626,
((GTM5_HUMAN:0.05682,GTM2_HUMAN:0.06169):0.00978,(GTM4_HUMAN:0.04715,
GTM1_HUMAN:0.03075):0.01321):0.03090):0.08544)[0.3333];
(GTM2_CHICK:0.19762,(GTM5_MOUSE:0.07698,GTM3_HUMAN:0.05942):0.13647,
(((GTMU_MESAU:0.06103,(GTM2_RAT:0.03807,GTM2_MOUSE:0.01723):0.02135):0.03741,
(GTM3_MOUSE:0.06916,(GTMU_CRILO:0.04806,(GTM1_RAT:0.03111,GTM1_MOUSE:0.02815):0.01935):0.
02106):0.03236):0.02522,
(GTM3_RAT:0.06150,(GTM2_HUMAN:0.05333,(GTM5_HUMAN:0.05213,(GTM4_HUMAN:0.04975,
GTM1_HUMAN:0.02815):0.01713):0.01605):0.04058):0.02860):0.08532)[0.3333];
(GTM2_CHICK:0.20335,(GTM5_MOUSE:0.07591,GTM3_HUMAN:0.06098):0.13099,
((GTM3_RAT:0.06487,((GTMU_MESAU:0.06237,(GTM2_RAT:0.03787,GTM2_MOUSE:0.01744):0.02037):0.
03904,
(GTM3_MOUSE:0.06806,(GTMU_CRILO:0.04899,(GTM1_RAT:0.03111,GTM1_MOUSE:0.02815):0.01842):0.
02098):0.03254):0.02944):0.02617,
(GTM2_HUMAN:0.05754,(GTM5_HUMAN:0.05427,(GTM4_HUMAN:0.05030,GTM1_HUMAN:0.02760):0.01481):
0.01128):0.03306):0.08668)[0.3333];
fasta.bioch.virginia.edu/biol4230 18
9
Running PHYLIP – distance methods
fasta.bioch.virginia.edu/biol4230 19
10
Running PHYLIP – (f)dnadist
15
GTM1_HUMAN 0.000000 0.111515 0.328043 0.084938 0.098515 0.202847
0.160670 0.222157 0.323212 0.195992 0.188005 0.176254 0.169073
0.202499 0.472135
GTM2_HUMAN 0.111515 0.000000 0.370425 0.122881 0.135281 0.234489
0.198432 0.246131 0.367307 0.220479 0.235718 0.162609 0.200569
0.245624 0.499002
GTM3_HUMAN 0.328043 0.370425 0.000000 0.330864 0.337744 0.395844
0.350801 0.407140 0.141206 0.397266 0.389013 0.385259 0.364146
0.386434 0.489052
GTM4_HUMAN 0.084938 0.122881 0.330864 0.000000 0.131796 0.233678
0.187505 0.236442 0.337068 0.235722 0.213963 0.182756 0.204816
0.204302 0.452330
GTM5_HUMAN 0.098515 0.135281 0.337744 0.131796 0.000000 0.230120
0.186003 0.230817 0.353029 0.215696 0.218532 0.174287 0.201916
0.216947 0.470660
GTM1_MOUSE 0.202847 0.234489 0.395844 0.233678 0.230120 0.000000
0.160969 0.116636 0.395293 0.062703 0.200109 0.200296 0.105091
0.202873 0.486157
GTM2_MOUSE 0.160670 0.198432 0.350801 0.187505 0.186003 0.160969
0.000000 0.172174 0.370651 0.159042 0.058864 0.178584 0.146716
0.103994 0.474313
. . .
fasta.bioch.virginia.edu/biol4230 21
fasta.bioch.virginia.edu/biol4230 22
11
Running PHYLIP – (f)fitch
+---GTM5_MOUSE 15 Populations
+-------7 Fitch-Margoliash method version 3.63
! +---GTM3_HUMAN __ __ 2
! \ \ (Obs - Exp)
! +---GTM5_HUMAN Sum of squares = /_ /_ ------------
! +-2 2
! ! ! +---GTM2_HUMAN i j Obs
! ! +-3 Negative branch lengths not allowed
! ! ! +--GTM4_HUMAN global optimization
! ! +-1
13---4 +-GTM1_HUMAN
! !
! ! +----GTM3_RAT Average percent standard deviation = 4.78966
! ! !
! ! ! +---GTMU_MESAU Between And Length
! +-10 +-12 ------- --- ------
! ! ! ! +--GTM2_RAT 13 7 0.13286
! ! ! +-9 7 GTM5_MOUSE 0.07381
! +-5 +GTM2_MOUSE 7 GTM3_HUMAN 0.06739
! ! 13 4 0.05956
! ! +-GTMU_CRILO 4 2 0.02688
! +-11 2 GTM5_HUMAN 0.06200
! ! +----GTM3_MOUSE 2 3 0.00263
! +-6 3 GTM2_HUMAN 0.06785
! ! +-GTM1_RAT 3 1 0.00736
! +-8 1 GTM4_HUMAN 0.05312
! +-GTM1_MOUSE . . .
!
+-----------------GTM2_CHICK
remember: (although rooted by outgroup) this is an unrooted tree!
Sum of squares = 0.47717
fasta.bioch.virginia.edu/biol4230 23
fasta.bioch.virginia.edu/biol4230 24
12
Drawing trees- (f)drawtree
GTMU MESAU
GTM3 RAT
GTM2 RAT
GTM4 HUMAN
GTM1 HUMAN
GTM2 MOUSE
GTM2 HUMAN
GTMU CRILO
GTM5 HUMAN
GTM3 MOUSE
GTM1 RAT
GTM1 MOUSE
GTM3 HUMAN
GTM5 MOUSE
GTM2 CHICK
fasta.bioch.virginia.edu/biol4230 25
GTM3 HUMAN
GTM4 HUMAN
GTM1 HUMAN
GTM5 HUMAN
GTM2 HUMAN
GTMU MESAU
GTM5 MOUSE
GTM2 MOUSE
GTM1 MOUSE
GTM3 MOUSE
GTMU CRILO
GTM2 CHICK
GTM3 RAT
GTM2 RAT
GTM1 RAT
GTM3 MOUSE
GTM5 MOUSE
GTM3 HUMAN
GTMU MESAU
GTM1 MOUSE
GTMU CRILO
GTM2 HUMAN
GTM2 MOUSE
GTM1 RAT
GTM4 HUMAN
GTM5 HUMAN
GTM2 RAT
GTM1 HUMAN
GTM3 RAT
fitch kitcsh -
(evolutionary clock)
fasta.bioch.virginia.edu/biol4230 26
13
Evaluating trees- (f)consense
Consensus tree program, version 3.63
Are these settings correct? (type Y or the letter for one to change)
y
fasta.bioch.virginia.edu/biol4230 27
fasta.bioch.virginia.edu/biol4230 28
14
Evaluating trees- (f)consense
Extended majority rule consensus tree +------GTM1 RAT
+--3.0-|
CONSENSUS TREE: +--2.0-| +------GTM1 MOUSE
the numbers on the branches indicate the number | |
of times the partition of the species into the two sets +--3.0-| +-------------GTM3 MOUSE
which are separated by that branch occurred | |
among the trees, out of 3.00 trees +--3.0-| +--------------------GTMU CRILO
| |
| | +-------------GTMU MESAU
| +---------3.0-|
+--2.7-| | +------GTM2 RAT
| | +--3.0-|
| | +------GTM2 MOUSE
| |
| +----------------------------------GTM3 RAT
+--3.0-|
| | +------GTM1 HUMAN
| | +--3.0-|
| | +--2.0-| +------GTM4 HUMAN
| | | |
+------| +----------------3.0-| +-------------GTM2 HUMAN
| | |
| | +--------------------GTM5 HUMAN
| |
| | +------GTM3 HUMAN
| +-------------------------------------3.0-|
| +------GTM5 MOUSE
|
+-------------------------------------------------------GTM2 CHICK
• The problem:
– the (f)consense program produces the best
consensus tree, but the branches reflect the
consensus frequencies, not the evolutionary
branch lengths
• The solution:
– give consensus tree to fdnaml or ffitch using
the 'U' user tree option – calculates branches for
a single tree, does not do a search (fast)
fasta.bioch.virginia.edu/biol4230 30
15
User tree – (f)dnaml
Nucleic acid sequence Maximum Likelihood method, version 3.63
16
Phylip for dummies
• Programs for Parsimony, Distance, and
Maximum Likelihood
• infile/outfile/outtree/intree
– either always change, or never use
– Use EMBOS (f) programs
• (f)consense to build consensus tree (but
invalid branch lengths)
• User tree to calculate branch lengths for
consensus tree
• (f)drawtree for non-trees, (f)drawgram
for trees
fasta.bioch.virginia.edu/biol4230 33
17