8 - Phylogenetic inference: Parsimony and Maximum Likelihood...and beyond. Flashcards
True or false? Neighbour joining doesn’t care about a molecular clock?
True!
It cares that distances should fit in a tree and that branch lengths should be minimal.
How can distances methods incorporate optimality criterion to determine the best trees?
- Fitch-Margoliash and related least-squares methods: For all trees (exhaustive search) the weighted least squares is calculated (Rs) to determine the difference between the real distance matrix and the one implied by the tree
Rs = Σ[(d - e)^2 / w]
d: pairwise distance between taxa i and ii in the matrix
e: pairwise distance between i and ii implied by the tree
w: weighting function that depends on error associated with distances (ie. big distances have more error than small distances)
Intuitively: measures how close do the tree distances fit the distance matrix
Choose tree with smallest Rs
What is a non-informative character in parsimony method?
Characters that have the same number of changes for all possible TREES.
What are the two methods for finding the best tree?
Exhaustive search: Consider every tree and use optimality criterion, guaranteed to find the best tree, but impossible for large trees (eg. more than 11 sequences)
Heuristic tree searching: Most commonly used. First builds a tree and then takes this starting tree and rearranges it to find alternatives (hill climbing algorithm). Not guaranteed to find the optimal tree.
What are the two steps of sampling tree space in the heuristic method of tree searching?
- Build reasonably good tree
2. Change branch order to improve optimality criterion (eg. parsimony score or least squares) - branch swapping
List three branch swapping methods in order of least to most extreme
- Nearest neighbour interchange
- Subtree pruning and regrafting
- Tree bisection and reconnection
What is the biggest problem with parsimony? What is the solution to this?
Long branch attraction
Parsimony will be guaranteed to wrongly cluster long branches together (Felsenstein zone)
The solution is Maximum Likelihood! Takes into account:
- Nonparsimonious as well as parsimonious paths of character evolution
- Changes weighted differently if they occur on a branch of a different length
- Allow for different rates of evolution in different organisms (relax molecular clock)
- Weight different kinds of events (eg. transitions and transversions differently)
What is the likelihood principle?
In statistics, the likelihood principle is a controversial principle of statistical inference that asserts that, given a statistical model, all of the evidence in a sample relevant to model parameters is contained in the likelihood function.
P(H | D) = P(hypothesis) x P(Data | hypothesis) / P(data)
P(Data | hypothesis) = likelihood of a hypothesis (remember formula)
AKA: the probability of observing the real data if the hypothesis were true
How does the likelihood principle allow you to evaluate alternative hypothesis in the absence of information about the prior probability of the hypotheses under test?
The likelihood principle:
If P(D|H1) > P(D|H2), then we may favour H1 over H2
if we have no knowledge about the prior probabilities of the hypotheses
Example: If you knew that there easily could be both spiders or raccoons in your attic and you hear noises in your attic, the probability that you would hear a noise if it were spiders moving around is LESS than the probability you would hear a noise if raccoons were moving around
àin other words: P(Noise | Hraccoons)»_space; P(Noise | Hspiders)
Describe the likelihood principle in regards to trees
Of all possible trees (hypotheses), which one makes observing the data the most probable?
P(data|tree1) versus P(data|tree2) versus etc.
P(D|T) = P(D,C1|T) + P(D,C2|T) etc.
Give the steps for finding the maximum likelihood for a tree given an alignment
- Given current data and a tree, reconstruct all possible pathways of evolution with all possible character states at internal nodes
- For each reconstruction, calculate the probability of that sequence of events
- Sum the probabilities from all possible pathways, (the law of total probability) to get likelihood for that alignment position for that tree
- Multiple the likelihoods of each site by each other (sites are assumed to be independent) to get total likelihood