Molecular Phylogenetics Flashcards
What do these mean? Taxa Clades Branches Nodes Roots
Entities being compared Groups of taxa sharing a common ancestor Reflecting evolutionary change Points where branches meet Oldest point on the tree
What are the 4 aspects of a tree?
Topology (branching order)
Branch lengths (indication of genetic change)
Root (oldest point on tree)
Confidence (bootstraps/probabilities)
What models of sequence evolution are there?
Jukes & Cantor Model: assumes all nucleotides equally frequent and all changes equally probable, K=-0.75ln(1-4d/3)
Problem: not all changes equally likely, some bases more likely and diff rates of substitution
Kimura 2-parameter model: Allows different rates of transitions and transversions, higher rate between C & T, K=-0.5ln[(1-2p-q).(1-sq)^0.5]
Tamura-Nei model: allows different rates of transitions (A G), & of transitions (C T), & of transversions, & allows unequal base composition
How do rates vary in molecular evolution?
Rates vary among genomes - should always use sequences from the same genome to calculate distances
Rates vary among proteins - should always use same gene/protein to calculate distances, should also use same part for all species
Rates vary among lineages - rate constancy assumed by UPGMA not a safe assumption
What is the maximum parsimony method?
‘Cladistic’ method
Starts from a set of variable character states and aims to find tree with smallest number of character state changes
Only uses ‘informative’ sites
Makes an unrooted tree, and may be more than 1 equally maximally parsimonious trees
Not good estimates of branch lengths
What is the UPGMA (unweighted pair-group method with arithmetic means)?
‘Phenetic method
Starts from a matrix of pairwise distances among taxa
Assumes perfect molecular clock
Proceeds by progressively clustering taxa with shortest distances
Doesn’t evaluate all possible trees
Produces tree rooted at midpoint
What is the Neighbour-Joining (NJ) method?
Starts from pairwise distance matrix
Minimum evolution tree (shortest total branch length)
Evaluate all possible trees or take a short cut
Start from a star tree and try all possible positions for a new branch, each time: calculate branch lengths, sum for total tree branch length, choose tree with smallest total length
Fast - good for large data sets
Good at recovering the true tree
What is the maximum Likelihood method?
Need model of sequence evolution, need a criterion/set of criteria to choose between alternate trees, evaluate all possible trees
Allows complex models of sequence evolution
Formally evaluates different possible trees
Computer-intensive
For every possible tree consider probability: at each site in the alignment, of each possible nucleotide character state for ancestral nodes
Take product of all of those probabilities as the likelihood value for that tree
Choose tree with highest (log) likelihood
How do you do bootstrapping?
Construct a pseudo-replicate alignment:
- randomly sample sites from the real alignment
- sample with replacement
- until same length as real alignment
Make a tree using the same method
Repeat many times
Record how often each partition (= internal branch) occurs across pseudoreplicates
Why use bootstrapping?
Estimate of how consistent the phylogenetic ‘signal’ is along the alignment
Longer branches likely to have higher values
Values around 75% (or higher) generally taken as ‘meaningful’
What problems can occur with phylogenetic trees?
Long branch attraction
Outgroups
What is long branch attraction?
Unequal rates of evolution causes rapidly evolving lineages are inferred to be closely related, regardless of their true evolutionary relationships
Usually in maximum parsinomy
What are some examples of long branch attraction causing problems?
Herpes virus evolution: tend to co-evolve with hosts, genes evolve ~10 x faster than mammalian genes, occasionally acquire extra genes from host genome
Long branch attraction made it seem the origin of the BoHV-4 Bo17 gene not from buffalo
Why are outgroups used?
Midpoint rooting - could fail with unequal rates of evolution
Outgroups useful to root trees
(All good phylogenetic methods produce unrooted trees)
An outgroup: Should be as close as possible to the other species, because a distant outgroup may not find the root of the other species (long branch attraction, or other problems)
But a very close outgroup may not be the outgroup?