Substitution models - phylogenetics Flashcards

Question

How do you calculate the branch length between two taxon and the new cluster they joined to create?

Answer 1

The distance between the joined elements / 2

Answer 2

The result is an ultrametric rooted tree. Ultrametric means that it is the same distance from all taxa to root. The ultrametric assumption makes the algorithm sensitive to unequal rates among lineages. Some taxa might have a higher evolutionary rate than others in the tree and the UPGMA will then give the wrong topology in the tree.

Answer 3

The NJ algorithm is also based on a distance matrix with n taxa but the difference from the UPGMA is that NJ allows for unequal rates of evolution so that branch lengths are proportional to amount of change and the result is an unrooted tree. This algorithm is slightly more complex and is slower than the UPGMA.

Answer 4

Key concept of the NJ algorithm is that divergence is a measure of total branch lengths in the neighbor joining process.

Answer 5

1. Distance matrix 2. Create Q matrix 3. Join taxon with smallest distance to new cluster 4. Calculate branch lengths of joined taxa to new node 5. Calculate distance from new node to the other taxons in new distance matrix. 6. Repeat until done.

Answer 6

The distance matrix (D) is not used directly like in the UPGMA but it is used to create a new matrix (Q) that has the net divergence of each taxon pair. Divergence in this case is a measure of total branch length in the neighbor joining process.

Answer 7

If you have joined c,d to the new cluster u: (c,u) = 0.5 x D(c,d) + ((Rc – Rd) / 2(n-2)) d,u) = D(c,d) – (c,u) In NJ the branch length is original distance divided by two just like in the UPGMA but we preserve the remaining divergence.

Answer 8

Q(x,y) = (n-2)D(x,y) – Rx – Ry. Where Rx is all the distances to x and (n-2) is the degrees of freedom.

Answer 9

The same way as in UPGMA but we also subtract the original distance: D(u,e) = D(c,e) + D(d,e) – D(c,d) / 2

Answer 10

Maximum parsimony is an optimality criteria. The principle is to find the tree that minimizes the number of evolutionary changes. It is based on the fact that evolution is lazy and the short way is always the right way. In other words: the shortest tree is the most correct tree. This tree must be found amongst all possible trees. For a small number of taxa (fewer than 10) it is possible to do an exhaustive search but for trees with a higher number of taxa a heuristic search must be done. This method is sensitive to unequal rates among lineages and long-branch attraction.

Answer 11

Maximum likelihood is a statistical approach that renders itself to various statistical tests of phylogenies, such as the likelihood ratio test. A likelihood analysis strives to find the tree that has the highest likelihood given the data (nucleotide or amino acid sequences) and a model of sequence evolution. Parsimony aims to find the tree that has the lowest number of evolutionary changes.

Answer 12

Likelihood: The likelihood is a function that measures the probability of observing the data given that the model used is true. P(D|M). Posterior probability: The posterior probability is the probability of a model given both the likelihood and prior probabilities. P(M|D). For this you need the prior probability P(M). (The prior knowledge of the model, do you know that it is true/false?). The maximum likelihood finds the tree that is most likely given our data and some parameters such as an evolutionary model and the Bayesian statistics incorporates the posterior probability meaning that it includes our prior knowledge.

Answer 13

Markov Chain Monte Carlo is used to sample from the posterior distribution when doing Bayesian statistics.

Answer 14

A likelihood is the probability of the observed data given the model. P(D|M). The posterior probability is the probability of the model given the likelihood and the prior probabilities. You need the prior

Answer 15

Clustering methods are faster because they are linear algorithms, follow fewer rules, and they make fewer assumptions, in the case of UPGMA they just connect the most similar groups(shortest). While optimality criteria methods are slower due to complexities such as looking for a model of evolution that best fits, making more statistical analysis like in the case of ML where you calculate the likelihood and they offer more robust and statistically grounded approaches for inferring phylogenetic trees and estimating branch lengths and model parameters.

Answer 16

Bootstrapping is used to give greater confidence to your derived tree. The bootstrapping will derive a tree many times over and check to see how many of the times the splits were the same. If one split occurs many times it indicates that the chance of that split happening is higher than the random chance.

Substitution models - phylogenetics Flashcards

(40 cards)