Lecture 6 Flashcards
What is the phenetic approach based on?
sequences with short pairwise distances cluster
What is the cladistic approach based on? what is the benefit compared to phonetic?
sequences with many shared characters cluster, therefore evolutionary process is accounted for implicitly, and it doesnt rely entirely on pairwise distance matrices, and correlations which may not have been accounted in the phonetic approach are accounted for
Cladistic approach uses —-, which finds the tree needing —?
parsimony, smallest number of mutations
What is the parsimony score of a tree?
The lowest number of mutations required to explain the sequences at the tips of the tree
What is the parsimony tree?
the tree with lowest parsimony score : an optimisation problem
T or F : Rooted trees obtained from the same unrooted tree dont have the same parsimony score.
False, they do
What does the parsimony method do?
For n sequences with length m, it considers each unrooted tree ((2n-5)!! trees), and it calculates the parsimony score for each of these trees (4^(n-1)*m), and it finally outputs the unrooted tree with the lowest score.
How can we improve the second step of the parsimony method?
using the fitch algorithm, we pick a cherry, we ask if they first nucleotides are the same, if they are not the same we write down the set containing both nucleotides. if yes, we dont update. if the sets aren’t disjoint, we write down the intersection. Therefore, the parmisony score is the minimal number of mutations required to explain the sequences at the tips.
How many internal nodes, does a rooted tree on n tips have?
n-1
What is the running time of the fitch algorithm?
(n-1)m
How is the parsimony tree found in the fitch algorithm ?
It’s found by calculating parsimony score for each unrooted tree
The parsimony decision problem is an — problem
NP complete
Is the parsimony problem statistically consistent or not?
It’s not since no back substitutions or parallel substitution are considered which lead to long branch attraction
How was the origin of HIV required?
incidence data was used which gives impression of the dynamics since the data was collected. virus sequencing data from different host species allows us to infer the phylogenetic tree informing before 1980. ML phylogenetic tree inference was used to investigate early HIV
What are the input and outputs for the ML tree inference?
Input is the sequence alignment. Output is the tree which maximises the probablity of the sequences given the tree and the sequence evolution parameters
The ML tree inference requires an —- model and the parameters of the model can be—–?
evolutionary, co-estimated
In ML in phylogenetic, what is the parameter?
Each unrooted tree with branch lengths
What do sequences in phylogenetic in ML evolve according to ?
they evolve according to the parameters provided in the rate Q matrix
What is the inference in the ML method?
determine the best unrooted tree, parameter which best explain the alignment max L(tau, Q;D) where D is the sequence alignments
Is the substitution process typically time reversible or not?
yes
What is the running time of the likelihood calculation? how did you get it?
- multiply over all sites O(m)
- sum over internal nucleotides at n-1 internal nodes (O(4^n-1))
- multiply over 2n-2 branches O(2n-2)
so over all: O(m4^nn
How can we improve the ML calculation?
Using felestein’s algorithm
What is the time complexity of felsentein’s pruning algorithm? explain how
Each recursion step is summation over four time four states : O(n)
-the recursion needs to be performed for each of m sites O(m)
so in total : O(nm)
The problem of finding a tree and branch lengths with likelihood value >/L is —.
NP complete
In ML inference we need to consider all —– trees and try all ——. we also have to integrate over all —-.
unrooted, realistic branch lengths, internal node sequences
What are the two algorithmic methods from phenetic inference methods?
UPGMA and neighbour joining algorithm
In UPGMA we obtain a — tree, in neighbour joining we obtain a — tree.
rooted, ultrametric tree, unrooted
UPGMA assumes a —, where branch lengths correspond to —.
strict molecular clock, calendar time
UPGMA and neighbour joining algorithm both have —- time complexity and are statistically —-.
polynomial, consistent
Two optimality methods of phonetic approaches for inference are ?
Least square methods
Least square methods have time complexity of — and are statistically —-.
NP complete , statistically consistent
Phenetic approaches disregard information beyond —-.
pairwise distances
Parsimony is a — approach and is an —- problem , and it is statistically —–.
cladistic, NP complete, inconsistent
In parsimony what is returned?
the tree requiring the least amount of mutations is returned.
What method is mechanistic approach for inference? in this method statistically consistent or not? what about it’s required time?
ML method, consistent, NP complete
What does ML explicity account for?
Evolution
What does the ML method return?
It returns the ML tree with branch lengths which correspond to the number of mutations alongside evolutionary model parameters
In the Fitch algorithm. do we obtain all most parsimonous ancestral sequences when choosing the different nucleotides in the curly brackets?
No we cant. write example like in the solution
Does the maximum likelihood tree construction method return estimates for the internal sequcnes? give a reason
No, we get the probability of the data given the tree and some sequnces in the roots
Does the fitch algorithm return the parsimony score for any phylogenetic tree and any sequence alignment?
yes
In a time reversible model, does the position of the root change the probability?
No, the model doest distinguish the direction so we obtain the same results.