Lecture 7 Flashcards
How do we search the tree space for the maximum likelihood tree?
We need to propose different unrooted trees : NNI, SPR, TBR moves
We also need to propose different branch length: multiply each branch length by some factor
We can use hill climbing strategies by taking small step sizes in the tree space to find the optimum tree
How does the NNI move work? draw
Each branch in the tree connect to different subtrees or nearest neighbours. interchanging a subtree on one side of the branch with another on on the other side is an NNI. Such rearrangement is possible for each internal branch
How does SPR work? draw
it works by branch swapping by subtree pruning and regrafting. for instance a subtree is pruned and then reattached to a different location on the tree. steps are repeated until the optimal alignment is reached.
How does TBR work? draw
IT works by branch swapping by tree bisection and reconnection. Tree is broken into two subtrees by cutting an internal branch. two branches one from each subtree are then chosen and rejoined to form a new tree.
Which evolutionary models can we use for testing?
likelihood ratio test and AIC
How can we assess confidence of the inferred parameters (phylogeny and the substitution rates)
likelihood ratios and bootstrapping
Does bootstrapping require ML?
no
How do we formulate our null hypothesis in model testing?
We ask if we can reject model H0 in favour of model H1
How do we formulate the likelihood ratio?
We assume data evolved under a model H0, and a model H1 whithin which H0 is nested. under mild conditions: refer to slide 18
How does the likelihood ratio test work?
we consider two models: H1 which is the general model and H0 being the null model. we then find the 2*(log l (parameter 1)-logl(parameter0)
if this value lies in the end of the a alpha tail of the chi distribution, we can reject the null model
if the null model is the tree model, we would falsely reject it in what proportion of the tests?
alpha proportion of the test
if the null model was the false model, we expect the null model to have —- likelihood than there true model, and we could —- the null model only in a very — proportion of the tests
much lower, accept, low
What specific fit are we assessing here?
we assess the fit of H0 relative to H1. even tho H1 may be a very bad model, we may reject H0 In favour of H1, since H0 is even worse
When comparing nested models, the simple model is obtained by —- the parameters of the general model.
restricting
Could the simple model have free parameters here?
yes, the simple model may include free parameters.
What is the type I and type II errors?
Type 1: when the simple model is tree but we reject it. Type 2: when the simple model is false but we accept it.
Accuracy =
1- type I error
Type I error is the —–, and is controlled by setting —-.
significance, alpha
Power =
1- type II error
How do we generally asses the power?
by simulating under the general model and assessing the number of times that H0 is accepted.
if null model is false and is rejected in all experiments the power is estimated to be what?
1
The power — with an increasing difference of the tree modern and the null model.
increases
AIC is for what type of models?
non-nested
AIC = ?
-2logLi(theta)+2pi, pi is the number of parakeets and Li is the likelihood function of the model I
How AIC used? what is the rationale behind it?
-Its used to calculate the AIC for each model
- then the model with the lowest AIC is chosen
rationale : AIC basically picks the model with the smallest expected pullback-leibler distance to the true model
Models having AIC within 1-2 of the minimum:
substation support, should receive consideration in inference
Models having AIC within 4-7 of the minimum:
considerably less support
Models having AIC>10 above the minimum:
essentially no support
Slide 21**
How many branches can a rooted tree with 12 species have?
21
if we want to test JC against GTR, can we do a likelihood ratio test?
yes if we perform the test on the same tree with fixed branch lengths. no if we perform it on different trees, since each tree is a different parameter therefore the models wont be nested anymore and so we need to use AIC
How do we determine the confidence interval in a fixed tree given the evolutionary parameters ?
we determine the value of the log likelihood function in parameter estimate l(Theta;x), we then subtract l(Theta;x) - 0.5 chi-squared. and determine the actual values of the estimate for which the equation above is tree. see slide 51.
Calculating the confidence interval for complex objects such as tree topologies isn’t possible using the previous method. so what can we do? what is the limitation of this
We can do more experiments, and ignore the smallest and largest 2.5% of the outcomes and consider the minimum and maximum, however for many question we cant repeat experiments, for instance we cant repeat plant speciation.
So then what else can we do to find the CI?
we can mimic more experiments by bootstrapping, ie creating artificially new datasets.
How does bootstrapping work?
we do testes by relying on random sampling with replacement. if we have enough data initially we. should receive the same results as what we do with the replacement.
How do we use bootrapping for phylogenies based on an alignment sequences with length m?
we sample m sites at random with replacement and infer a phylogeny based on the new data and repeated he procedure many times
Explain each step of maximum likelihood inference.
1- infer a maximum likelihood tree :
- employ flesentein’s pruning algorithm for each tree and branch lengths
- choose the tree with branch lengths which maximise the likelihood
- do this for each substation model and calculates its AIC
2- determine the model and tree with the highest support using AIC
3- determine the confidence interval for the substitution model parameters based on the likelihood ratios
4- determine the confidence in maximum likelihood tree using bootstrap
bootstrap is used to determine the confidence in —, and likelihood ratio is used to determine the confidence interval for the —-.
maximum likelihood tree, substitution model
Is there a way to test how to best root a maximum likelihood tree without employing any extra information?
yes. only unrooted form has a meaning in a likelihood contest
Can you use the bootrstrapping ideas for assessing confidence in UPGMA?
yes. if we have a very large sequecne alignment since we expect to see some variation. we can usually use it for different phylogeny methods.
What is required to infer a direction of transmission from a phylogeny?
by including more sequences , and getting a more detailed tree. if one sequence is contained in one we could tell the direction of transmission. and its a bit difficult to say 100% which one was better.