7 - Trees and distance methods Flashcards

Question 1

Q

What are four different methods for inferring phylogeny?

Answer

A

Distance matrix methods: pairwise distances between all sequences in alignment
Parsimony-based methods
Maximum likelihood methods
Bayesian methods

Question 2

Q

What are the two steps to distance matrix analysis of phylogeny?

Answer

A

Calculating the pairwise distances between all species (trimmed multiple alignment) to make a distance matrix
Infer phylogenetic tree from distance matrix by algorithmic method (eg. NJ, UPGMA) or by optimality criterion method (eg. least squares)

Question 3

Q

How do you calculate p (Hammings distance)?

Answer

A

observed changes / # positions in sequence

or
p = 1 - (proportion of identical sites) = 1 - identity

This is the observed proportion of differences and is used to recover an accurate tree distance (D)

Question 4

Q

List the different models for calculating tree distance

Answer

A

Jukes-Cantor one parameter model (all subs at equal rates)
D = -3/4ln(1 - (p)4/3)
Kimura 2-parameter process (transition not equal to transversion)
D = 1/2ln(1 / (1 - 2P - Q) + 1/4ln(1/1-2Q)

These can be fitted to a gamma distribution which will quickly show the proportions of sites with slow, medium and fast substitution (evolution) rates.

Shape is governed by the shape parameter, alpha (α),

Question 5

Q

What happens when you ignore among-site rate variation in finding tree distance (D)?

Answer

A

UNDERestimation of actual distance.

Question 6

Q

Give two methods for inferring a tree from the distance matrix

Answer

A

Algorithmic methods

Unweighted pair group method with arithmetic (UPGMA)
Neighbour joining
BIONJ and WEIGHBOR

Optimality criterion-based methods

Minimum evolution
Fitch-Margoliash
Least-squares

Question 7

Q

Describe the UPGMA method for tree reconstruction

Answer

A

Assumes rate of evolution is constant in all organismal lineages so that distance (D) is a linear function of time (T), it assumes a molecular clock.

It assumes the distances are ultrametric, which they typically are not.

Starts off by clustering the first pair of taxa with the smallest distance, then the next smallest distance is found and its branching point is calculated

Question 8

Q

How do tree distance matrices programs treat gap site containing columns?

Answer

A

They delete them.

Replacing them with ? or ‘-‘

Question 9

Q

List four problems with UPGMA

Answer

A

Assumes the data reflects an ultrametric tree
Tends to move more divergent sequences deeper into the tree (long branch attraction artefact)

LBA is one of the biggest pitfalls in molecular phylogeny

Question 10

Q

What is long branch attraction?

Answer

A

Long branch attraction (LBA) causes species to seem more closely related in a phylogeny than they really are due to mutations or traits occurring independently (convergent evolution) or FASTER. These shared traits can be misinterpreted as being shared due to common ancestry.

UPGMA is especially bad at this.

Question 11

Q

Describe the neighbour-joining method of tree distances

Answer

A

Unlike UPGMA, NJ does not require a molecular clock, only additive distances (ie. that distances between taxa can be represented by a tree structure)

This allows rates to vary in different lineages
Algorithms that seeks out neighbours (closest pairs of sequences)
Starts with a star-tree and sequentially pairs up taxa to minimize the total length implied by the tree (lowest score is best)

7 - Trees and distance methods Flashcards

(11 cards)