Lecture 5 Flashcards
What is a tree in context of phylogeny?
Its a graph consisting of nodes and branches without a loop
What is an unrooted phylogenetic tree? draw one
a tree with two types of nodes:
* tip/leaf: node with 1 branch attached
* internal node: node with 3 branches attached
what is a rooted phylogenetic tree? draw one
a tree in which one branch is subdivided by a new node
Can unrooted trees be rooted? if yes, how so?
yes. with an outgroup( a distantly related individual ) which means that the branch ending in the outgroup is subdivided by the root node.
and it is chosen to be a very distantly related organism to the remaining organism in the tree
Each branch may have a length of ?
> /0
What is a pedant branch?
branch attached to a tip
what is a cherry?
a pair of tips only separated by one internal node
what is a caterpillar tree?
a tree with only one cherry
what is a monophyletic group or clade?
Its all descendants of a common ancestor
What is an ultrametric tree? show on a tree
Sum of all branch lengths from any tip to the root is the same
Polytomy
the definition of a phylogenetic tree is extended so that internal nodes have more than 3 branches attached. this node is called polytomy
What is the string representation of this tree? refer to slide 8
((B:1, C:1):1, A:2):1,D:3)
What is the string representation of this tree? refer to slide 9
((A:2, D:4) :1, B:1, C:1)
Can a tree have multiple newick representations?
yes but they are all equivalent
only? contributes to branch lengths
vertical distances along the evolutionally time axes
In Charles Darwin’s representation what are the tips and what are the branches?
tips are the species, and the branches are the ancestry
In a pylogeny of species of simians what are the branching events and branch length?
branch events are speciation events
branch lengths are time between speciation events
In a pathogen phylogeny of HIV epidemic, what are the tips, branching events and branch lengths?
- different infected hosts
- transmutation events
- time between transmission events
What was the data used for measuring similarity between species previously and currently?
previously: morphology
currently: typically sequencing data for species or pathogens or B-cells, etc
Name 3 ways of defining similar between species
Phenetic, cladistic, mechanistic
What is phonetic based on?
- its based on over all similarity
- pairwise distance based
what methods does phonetic use?
UPGMA, least square algorithm
What is cladistic based on?
shared characteristics
character based
what methods does cladistic use?
parsimony
What is mechanistic based on?
-evolutionary model
-character-based
what methods does mechanistic use?
maximum likelihood, Bayesian inference
In an alignment each site is a ?
homolog
How is each alignment obtained?
from raw sequencing reads by putting reads such that number of mutations, insertions and deletions are minimised
What is the basic idea of distance based methods?
1-we define how to measure distance between sequences (JC69,etc)
2- Calculate the distance between all pairs of sequences
3- find a tree where the distances follow the sequence distance matrix most closely
What are two strategies of distance based methods?
algorithmic and optimality
How does the algorithmic method work?
its a sequence of steps where iteraretively smallest distances are clustered in a tree
How does the optimality approach work?
Using a cost function, it minimises the difference of the sequence distance matrix to the inferred tree distances
UPGMA assumes evolution according to what?
a strict molecular clock in which the rate of DNA/RNA/ Protein sequence evolution is constant over time
What is the output and input of the UPGMA?
input is the distance matrix, output is the ultrametric phylogenetic tree
Find the UPGMA tree for sequences below based on the hamming distance matrix for :
s1:TCACACCT
s2:ACAGACTT
s3: AAAGACTT
s4: ACACACCC
Slide 28
How does the least square method work?
It defined a cost function which minimised the sum of differences between the distance matrix and the tree distance matrix for a proposed tree
What is the runtime of UPGMA? How did you find that?
O(n^3) for n sequences.
n: for pruning nodes (replacing a cherry with a new node)
n^2 : for creating the distance matrix
therefore n^3 in total
How many trees can n=1,2,3 tips make?
n=1,2 both 1 for n=3, 3
For runtime of least square methods what shall we do?
we need to optimise the cost function and therefore we need to visit each tree in the space of trees, therefore we need to find how many trees on n tips exist: number of rooted and unrooted trees on n tips
if we have n tips, how many branches do we have?
2n-3
How many unrooted trees with n tips exist?
(2n-5)!!
How many rooted trees on n tips exist?
(2n-3)!!
The least square decision problem is an — problem, thus there is no — time algorithm unless –, so we have to check — trees with —.
NP-complete, polynomial, P=NP, all, n tips
Is UPGMA consistent? Explain further
yes, it is. The distance matrix tends towards the tree distances, therefore we cover the true tree
IS Least square method consistent? Explain further
Yes, the squared difference between the calculated matrix and the tree distance tends towards 0, therefore the true tree is a least squares tree.
UPGA and neighbour joining algorithms have a running time of :
polynomial time
running time of least square methods is ?
NP complete
What are two problems of phenetic approaches?
- they disregard information beyond pairwise distances
- large distances come with large variances which are typically ignored.
What is the minimal and maximum number of cherries in a phylogenetic tree with 99 tips ?
49 cherries and one left over
In how many ways can you write a network string for a rooted tree with species A,B,C? In how many ways can you write it for n species?
4 ways, answer in slides, 2^(n-1)
Consider the least square method, why would we use weights wi,j which aren’t equal to 1?
we don’t have infinite amounts of sequence data, we need to down weight the contribution of weights for distance matrix with a lot of noise wij=1/Dij( the estimated pairwise distance matrix)