Phylogenetic Analysis
Phylogenetic Analysis
What Is Phylogeny?
• Phylogeny is the study of relationships among different groups of
organisms and their evolutionary development.
• Phylogeny attempts to trace the evolutionary history of all life on the
planet.
• It is based on the phylogenetic hypothesis that all living organisms
share a common ancestry.
• The relationships among organisms are depicted in what is known as a
phylogenetic tree.
• Relationships are determined by shared characteristics, as indicated
through the comparison of genetic and anatomical similarities.
Phylogenetic Tree
• A phylogenetic tree, or cladogram, is a schematic diagram used as a
visual illustration of proposed evolutionary relationships among taxa.
• Phylogenetic trees are diagrammed based on assumptions of cladistics, or
phylogenetic systematics. Cladistics is a classification system that
categorizes organisms based on shared traits, as determined by genetic,
anatomical, and molecular analysis.
• The main assumptions of cladistics are:
1. All organisms descend from a common ancestor.
2. New organisms develop when existing populations split into two groups.
3. Over time, lineages experience changes in characteristics.
Phylogeny-Terms
• Phylogeny- the evolutionary history of a group of organisms/ study of the
genealogy and evolutionary history of a taxonomic group.
Root
Ingroup
Outgroup
Tree structure
• A tree can be also presented in a text format: (A(B(C,D)))
• The graphic structure can be difficult to interpret (2-dimentional)
Methods
• Distance matrix
• Maximum parsimony
• Minimum distance
Distance matrix
• A distance matrix is calculated from the sequence dataset
• Algorithms: Fitch-Margoliash, Neighbor-Joining or UPGMA in tree
building
• Simple, finds only one tree
• Somewhat old-fashioned (OK if your alignment is good and
evolutionary distances are short)
Maximum parsimony
• Finds the optimum tree by minimizing the number of evolutionary
changes
• No assumptions on the evolutionary pattern
• May oversimplify evolution
• May produce several equally good trees
Maximum likelihood
• The best tree is found based on assumptions on evolution model
• Nucleotide models more advanced at the moment than aminoacid
models
• Programs require lot of capacity from the system
Algorithms used for tree searching
• Exhaustive search: all possibilities → best tree → requires lots of time
and computer resources
• Branch and Bound: a tree is built according to the model given → the
tree is compared to the next tree while its constructed → if the first
tree is better the second tree is abandoned → third tree… → best
possible tree
• Heuristic Search: only the most likely options → saves time and
resources, does not always result in the best tree
Bootstrapping
• Evaluation of the tree reliability
• n number of trees are built (n=100/1000/5000)
→ How many times a certain branch is reproduced