Evolutionary Trees Flashcards
Mitochondrial DNA
Powerhouse of the cell Originally independent organism captured by eukaryotic cell Contain their own DNA Inherited through the mother's ova Reproduce asexually
Asexual Inheritance - Reasons for a common ancestor
Easier to study
Populations will share a common ancestor because of natural coalescence and going through a bottleneck(leaving Africa)
Natural coalescence happens by chance but is enhanced by selection
Far enough back in time and asexual population will have a common ancestor
Cross-Breeding (Hominids)
Some cross-breeding between modern man and other hominids
Numbers not large and due to coalescence time being small, Neanderthal mitochondrial DNA vanished
DNA
Evolves sexually
Selfish Gene
Cannot build tree of life as you inherit from two parents
Each gene can be mapped to its own ancestral tree (assuming no crossover disruption)
Lots of different genes which come from different origins survive (different fitness)
Humans have DNA from Neanderthals and other hominids which have been selected for
Horizontal Gene Transfer
Bacteria and Archaea reproduce asexually
Exchange of plasmids (Loops of DNA)
Lead to antibiotic resistance
Doesn’t create a tree shape
Modelling evolution at species level
Cross-breeds have reduced fertility (1 sex is inviable/sterile)
Lack of significant cross-breeding allows for evolution to be modelled at the species level by a tree
Importance of modelling evolution
Understanding and controlling diseases like HIV which evolved over decades(HIV evolves asexually so we can build an evolution tree)
Understanding the development of cancer (disrupted genome -> tumours -> cells accumulate mutations which replicate faster than other cells) Normally cells in higher organisms reproduce asexually
Orthologues
Genes which diverge due to speciation
Useful for understanding the relationships between species
Paralogues
Genes which diverge due to gene duplication
Interested in different types of haemoglobins
Three camps of evolutionary trees
Evolutionary Taxonomy
Phenetics or numerical taxonomy
Cladistics
Modern approaches for building evolutionary trees
aka phylogenetic trees
Need some measure of evolutionary distance
Traditionally use of morphological features, now use sequence edit distance
Many plausible trees which can explain the data
All current algorithms are compromises for finding the best evolutionary trees
Molecular clock
Use of accumulation of mutations to see when species separated.
Mutations random and strongly affected by selection pressure. Can also involve a duplication of a stretch of DNA
Need to use as large a sequence as possible
Two types of evolutionary trees
Distance-based trees
Sequence based trees
Distance based trees
Table of distances between species and we wish to find a tree explaining these distances
Sequence based trees
Given a set of sequences and we want to find a tree with the minimum number of mutations per link (maximum parsimony) which explains the data
Perfect molecular clock
Evolutionary trees to existing species will have an ultrametric structure
For any three nodes, with 3 distances, two of the distances will be identical and one will be smaller than these (binary trees)
Issues with ultrametric trees
Cannot accurately measure divergence time, unlikely that the table of distances can be modelled with an ultrametric tree
UPGMA
Unweighted Pair Group Method using Arithmetic averaging
Build an ultrametric tree using a clustering (from the leaves upwards)
Select two closest subtrees, remove all nodes and replace the two joined subtree with a new subtree
Issues with UPGMA
UPGMA imposes ultrametric structure (assumes mutations accumulate at a constant rate)
If distances aren’t ultrametric then UPGMA can give poor trees
Even though some organisms are quite close, can be distorted by the tree
Assumptions for UPGMA
Assumes mutations accumlate at a constant rate for an ultrametric structure
In reality mutations experience selection and are often “non-trivial” (genome copy mutations)
“junk” DNA
Stretches of DNA where mutations appear to accumulate at a reasonable constant rate
Not well studied or stable regions
Additivity of Distances
Ultrametricity is a very strong condition dependent on a perfect molecular clock
Additivity of Distances is a less strong condition
Doesn’t assume constant mutation rate, only that they do accumulate (Distances consistent with an evolutionary graph)
True if all mutations independent of each other
Broken Additivity
Additivity assumes mutations just accumulate
Backward mutations or two species finding the same mutation break additivity
For additivity to be used, need to look at large parts of the genome so not affected by chance events like backward mutations
Maximum Parsimony Trees
Explain evolutionary history of set of sequences with as few mutations as possible
Most commonly used trees (sequences not distances)
Algorithms for Parsimony
Given aligned protein sequences
No efficient algorithm
Check all possible trees (branch & bound)
Feasible for moderate number of sequences (<50)
Need to cost and iterate through each tree
Parsimony Problem
Many different possible trees with different costs for a set of sequences
Finding tree with the smallest cost
Enumeration of trees
Method for systematically trying out all possible trees for the set of sequences
Branch & Bound
Any tree generated from a partial tree will have at least as high a cost as the partial tree
If the cost of the partial tree is >= the best cost so far, no point continuing further
Reduces search space
Assessing trees (Maximum Parsimony)
Difficult to assess directly
Use of bootstrapping (Trying different orders/duplicating certain items)
See if the maximum parsimony trees found look similar
Hein Algorithm
To score trees, we want to know the alignment
To do an alignment, we want to find related sequences
Can do both together but is very involved