DISTANCE METHODS (UPGMA & NJ) Flashcards
2 distance based tree methods
UPGMA - Unweighted Pair Group Method with Arithmetic mean
NJ - Neighbor-Joining
When two sequences are similar, they are likely to originate from the same ancestor
distance-based methods
sequence similarity can approximate evolutionary distances
distance-based methods
assume for any pair of species we have an estimation of evolutionary distance between them
e.g. alignment score
distance-based methods
goal is to construct a tree which best approximates these distance
distance-based methods
distance based tree
pls watch the video
can one always represent a distance matrix as weighted tree?
there is no way to add d to the tree and preserve the distances
REMEMBER:
Real matrices are almost never additive
ok
used to search stochastically for the best-scoring trees in tree space
heuristics
heuristics
upgma and nj
clustering problem: group items with similar properties
- clusters are homogenous
- clusters are well separated
hierarchical clustering
many clusters have natural sub-clusters which are often easier to identify
hierarchical clustering
combine hierarchical clustering with a method to put weights on the edges
UPGMA
rooted tree with edge lengths where all leaves are equidistant from the root
ultrametric tree
often represent the molecular clock which states that the rate of mutation is the same across all lineages of the tree
ultrametric tree
the distance from any internal node to any of its leaves is constant and equal
ultrametric tree
- assume the same rate of evolution (molecular clock hypothesis)
UPGMA
the length from root to each leaf is the same (ultra metric)
UPGMA
it is similar to fitch-margoliash algorithm (merge two most similar sequences or clusters first); but the calculation of branch lengths is even simpler
UPGMA
minimum evolution - the least total branch length (distance-based)
NJ
bottom-up clustering method
NJ
does not assume same rate evolution
NJ
fast & produce reasonable trees
NJ
seitou & nel algorithm
NJ
- Calculate pairwise distances
- Create distance matrix
- Determine net divergence for each terminal node
- Create rate-corrected distance matrix
- Identify taxa with minimum rate-corrected distance
- Connect taxa with minimum rate-corrected distance via a new node, and determine their distance from this new node
- Determine the distance of new node from rest of taxa or nodes
- Regenerate distance matrix
- ReturN to step 2
NJ